2D:4D Does Not Predict Economic Preferences: Evidence from a Large, Representative Sample

The digit ratio (2D:4D) is considered a proxy for testosterone exposure in utero, and there has been a recent surge of studies testing whether 2D:4D is associated with economic preferences. Although the results are not conclusive, previous studies have reported statistically significant correlations between 2D:4D and risk taking, altruism, positive reciprocity, negative reciprocity and trust. However, most previous studies have small sample sizes gathered from university students and there is also no consensus on the type of analysis (e.g., which hand to analyze or subgroup to focus on). We present results from a pre-registered large sample study testing if 2D:4D is associated with economic preferences. Data were collected in a representative sample of adults in the German Socioeconomic Panel-Innovation Sample (SOEP-IS), in a sample of about 3,450 respondents (about 5 times larger than the previously largest study in this field). We find no statistically significant association between 2D:4D and economic preferences in the largest study to this date on the topic.


Introduction
There is substantial variation in economic preferences between individuals. Some of this variation has been linked to cultural or situational factors such as gender norms (e.g Gneezy et al., 2009) or reference points (e.g. Thaler and Johnson, 1990). Relatively recently, the potential role of hormones has received substantial interest. Hormones have organizational effects on the brain during fetal development (Arnold, 2009;Lombardo et al., 2012;Phoenix et al., 1959), and these effects are often hypothesized to have long lasting effects on preferences and behaviors later in life. In particular testosterone, the sex steroid, has been hypothesized to be associated with a wide range of economic preferences and decisions.
While some studies have explored circulating levels of testosterone or administered testosterone (e.g. Bos et al., 2010;Eisenegger et al., 2011;Sapienza et al., 2009;Zethraeus et al., 2009), with mixed results (see Dreber and Johannesson, 2019 for further discussion), others have focused on prenatal testosterone exposure in utero. Such exposure is hypothesized to impact brain development of the fetus and could thus potentially explain some of the individual heterogeneity in preferences ( Baron-Cohen, 2002). Since testosterone exposure in utero cannot be measured directly after birth, researchers have looked for proxies of this exposure. The ratio of the length of the 2nd digit to the length of the 4th digit (2D:4D) on each hand (Manning et al., 1998) has been used as such a proxy.
2D:4D has been linked to a number of traits such as personality, sexual orientation and various cognitive abilities, but the results are often contradictory and there is also mixed evidence on the role of publication bias (see, e.g. Grimbos et al., 2010;Hönekopp and Schuster, 2010;Hönekopp and Watson, 2011;Puts et al., 2008;Voracek and Loibl, 2009;Voracek et al., 2011). Recently, there has been a surge in the number of papers testing for an association between 2D:4D and economic preferences or outcomes (e.g. Coates et al., 2009;Van Honk et al., 2012). Just on risk preferences alone, we count 18 papers.
Such relationships suggest that biological factors may partially affect economic activities through decisions. Several studies also investigated direct associations with real economic outcomes. For example, Coates et al., 2009 reported an association between 2D:4D and the profitability of financial traders, which if real could work through to a channel of economic preferences but also through other channels, and Nicolaou et al., 2017 reported an association between 2D:4D and self-employment.
We contribute to this literature by testing the associations between 2D:4D and five economic preference measures in a large, representative sample. To do so, we integrated a 2D:4D and Economic Preferences module in the German Socioeconomic Panel-Innovation Sample (SOEP-IS) (Richter & Schupp, 2015). Our sample contains 3,433 adult respondents, which is about 5 times larger than the previously largest study in this field (Brañas-Garza et al., 2018). Our five measures of economic preferences are i) risk taking, ii) altruism, iii) positive reciprocity, iv) negative reciprocity and v) trust. The questions used to measure these economic preferences were adapted from the preference survey module of the Global Preference Survey ) (see the Methods section for further details about the preference measures and Supplementary Information Text for the complete list of items). We chose to focus on these five economic preference measures because of previous attempts to link these measures to 2D:4D, although a large proportion of this literature has focused on risk taking and altruism. All our tests are pre-registered to avoid "researcher degrees of freedom" affecting the results (e.g. from choosing ex-post which hand or subgroup to focus on).
Our hypotheses about the direction of the association between 2D:4D and each economic preferences are based on the previous literature. For risk taking, the hypothesis is a negative relation to 2D:4D, which would indicate a positive association between prenatal testosterone exposure and risk taking. However, even though previous literature has also had this hypothesis, the results in this literature are far from conclusive (see Parslow et al., 2019 for a review). For example, the first paper on this topic reports a negative correlation between risk taking and left hand 2D:4D but finds no statistically significant correlation with the right hand 2D:4D, and the significant correlation can only be found in one sample and not another (Dreber & Hoffman, 2007). There are other examples of statistically significant negative correlations (e.g. Barel, 2019;Bönte et al., 2016;Brañas-Garza et al., 2018;Brañas-Garza and Rustichini, 2011;Garbarino et al., 2011;Stenstrom et al., 2011;Sytsma, 2014), though many of these only find statistically significant correlations in a subset of the population studied. There are thus also several null results (e.g. Alonso et al., 2018;Apicella et al., 2008;Aycinena et al., 2014;Barel, 2019;Bönte et al., 2016;Brañas-Garza et al., 2018;Chicaiza-Becerra and Garcia-Molina, 2017;Drichoutis and Nayga, 2015;Lima de Miranda et al., 2018;Neyse et al., 2020;Sapienza et al., 2009;Schipper, 2014;Stenstrom et al., 2011;Sytsma, 2014) and also one finding of a statistically significant positive correlation between 2D:4D and risk taking in a female subsample in one of the papers that found a statistically significant negative correlation in another subsample (Brañas-Garza & Rustichini, 2011). Based on the previous literature we test the hypothesis that there is a statistically significant negative association between 2D:4D and risk taking.
There are also several studies testing the association between 2D:4D and altruism, measured as giving in the dictator game (Brañas-Garza et al., 2019;Brañas-Garza et al., 2013;Buser, 2012;Galizzi & Nieboer, 2015;Millet & Dewitte, 2010). Although these previous results have been mixed, the main hypothesis in the previous literature has been a positive association between 2D:4D and altruism -this is also the hypothesis that we test. In the exploratory analyses we also test for a non-linear relationship between 2D:4D and altruism by also including the squared 2D:4D as done in some of the previous studies (e.g. Brañas-Garza et al., 2013;Galizzi and Nieboer, 2015).
When it comes to 2D:4D, positive reciprocity, negative reciprocity and trust, there are only a handful of earlier studies. Buser (2012), who explored the relation between 2D:4D and altruism, also studies the trust game and finds that low 2D:4D individuals send back statistically significantly less as the second player in the trust game, and thus are less reciprocal. This was the only previous study we were aware of testing for an association between 2D:4D and positive reciprocity before we wrote our pre-analysis plan. 1 Based on this study we test the hypothesis that there is a statistically significant positive association between 2D:4D and positive reciprocity. Responder behavior in the ultimatum 1 The recent paper by van Leeuwen et al. (2020), which is more thoroughly discussed in the Discussion section, also studies 2D:4D and among other things the economic behaviors trust, positive reciprocity and negative reciprocity and find no evidence for associations between these variables and 2D:4D. game can be interpreted as a measure of negative reciprocity. Buser (2012) also tests for an association between 2D:4D and responder behavior in the ultimatum game, but finds no statistically significant association. However, another study finds a statistically significant negative correlation between right hand 2D:4D and the minimum acceptance level in the ultimatum game (a higher minimum acceptance level suggest stronger negative reciprocity) ( Van den Bergh & Dewitte, 2006). These are the only previous studies we are aware of testing for an association between 2D:4D and negative reciprocity. Based on the study with the statistically significant negative correlation (Van den Bergh & Dewitte, 2006) we test the hypothesis that there is a statistically significant negative association between 2D:4D and negative reciprocity. Finally, there is only one previous study testing the relation between 2D:4D and trust. Buser, 2012 finds a statistically significant positive association between 2D:4D and trust in the first stage of the trust game. Based on this study we test the hypothesis that there is a statistically significant positive association between 2D:4D and trust.
The aforementioned studies are typically small and with many "researcher degrees of freedom" in the analysis (Gelman & Loken, 2013;Simmons et al., 2011). For example, sometime researchers focus on one hand and not the other, or take the average of the two hands, or look at subsamples and find something statistically significant in one subsample but not the other, and it is not clear ex-ante why there should only be an association in one group and not others. This type of analysis, sometimes referred to as "forking" as in "the garden of forking paths" (Gelman & Loken, 2013), can occur when the researcher has a specific hypothesis but has not specified exactly how the data will be analyzed so they let the data decide how to analyze it. Forking (and more intentionally misleading analyses) can lead to results with p-values that cannot be interpreted as false positive probabilities -statistically significant results can easily be more likely to be false positive than true positive ones (Simmons et al., 2011). The average total sample size for the 2D:4D studies discussed in Parslow et al. (2019) is N=262, with a range from 86 to 704 participants. In addition, most of these studies look at smaller subsamples based on gender or ethnicity, thus effective sample sizes tend to be substantially smaller. There is thus a need for a much larger study with a pre-registered analysis plan to rigorously test if 2D:4D is associated with economic preferences.
For our primary hypotheses tests, we focus on the right hand 2D:4D because it has been argued that the right hand 2D:4D is more sexually dimorphic than the left hand (Hönekopp & Watson, 2011). We test if the right hand 2D:4D is associated with the five measures of economic preferences in five regression analyses controlling for gender. We use the left hand 2D:4D and the average of the two hands in our robustness tests. Since previous literature sometimes reports finding correlations in one gender and not the other, we perform a pre-registered exploratory analysis where we interact 2D:4D with gender to test if the associations differ between men and women. As some previous studies also test for a non-linear relationship between 2D:4D and economic preferences, we also carry out a pre-registered exploratory analysis adding 2D:4D squared to the regression analysis.

Data collection and sample
The study was pre-registered prior to starting the data collection (https://osf.io/5vpdn/).
The pre-registration included the data, collection, measurement procedures and all statistical tests described below (including the exploratory tests and the robustness tests with the exception of the robustness tests using the corrected sample and the restricted sample, that were not pre-registered). The pre-registration did not include the instructions to the interviewers, but they are now posted on the OSF site. When we make any deviation from the pre-registration this is explicitly mentioned in the text. The data collection of the study was performed between September 2018 and December 2018 in the German Socioeconomic Panel-Innovation Sample (Goebel et al., 2019) and we received the data in May 2019. SOEP is a longitudinal survey study that has been running since 1984 and it has nearly 30,000 participants today. SOEP-IS, on the other hand, was established in 2011 (Richter & Schupp, 2015) and it is open for experimental and survey module suggestions from researchers every year. The survey committee decides on which of those suggested modules to include in the next wave. The selected modules are subsequently integrated to different sub-samples of SOEP-IS. The right to use the collected data belongs to applicant researchers for a year. After the one year embargo period, the data become available for the whole research community. The merged dataset also includes longitudinal socio-economic data, as well as previous years' modules. If a set of individuals participated in Modules A & B in 2014 and C & D in 2016, all their answers to these four modules can thus be merged into a single dataset. According to the 2018 release of SOEP-IS, it has a total number of 5722 participants from 3232 households.
We planned to survey the whole sample of SOEP-IS for the 2D:4D collection module.
While the number of participants was predicted around 4,500 adults prior to the survey, the actual number was 4,860. As 2D:4D measurement was voluntary, the participants could skip the measurement and continue responding to the survey questions. As a result, our working sample consisted 3,482 participants with complete right or left hand measures (3,433 individuals with right, 3,454 with left, 3,405 with both). On top of 2D:4D measurements the participants were also asked to reply to a set of survey questions about their economic and social preferences. The gender distribution of the working sample is balanced (53.8% women) and the mean age is 54.22 (SD=18.39).

Economic preference measures
The 2D:4D module included a total number of 8 survey questions to elicit risk taking, altruism, positive reciprocity, negative reciprocity and trust (see Supplementary Information for the phrasing of these 8 survey questions). All questions were adapted from the preference survey module of the seminal paper of , who followed both experimental tests and survey optimization procedures to construct their module (see also  for a detailed description of the preference module construction procedure).
The risk question is the standard risk question that has been included in the past waves of SOEP as well, where participants are asked "How do you rate yourself personally?
In general, are you someone who is ready to take risks or do you try to avoid risks (risk-averse)?". For this question, participants are asked to indicate their willingness to take risks on a Likert scale from 0 to 10 where a higher number indicates more risk taking. The Global Preference Survey involves almost an identical self reported question ("In general how willing are you to take risks?") and also a staircase procedure with hypothetical lotteries. We chose to use the standard self-reported question which was also experimentally validated in .
We measured altruism with two questions and negative reciprocity with three questions (see Supplementary Information for more details). The multiple survey questions were combined to a measure of altruism and negative reciprocity in the same way as done in the Global Preference Survey . They standardized responses on each survey question to z-scores and estimated weights for aggregating responses into the overall preference measure. We used the same approach and the same weights to aggregate survey questions into preference measures (see Supplementary Information for the weights of Falk et al., 2018 that we used). However, the index resulting from aggregating the zscores for the individual questions will not in itself be a z-index (as the STD of this index will not be equal to 1). We therefore also standardized the index for altruism and the index for negative reciprocity into z-scores, to ease the interpretation of results. This standardization of the indices was not done in the previous study  using these two indices and it was not mentioned in the pre-analysis plan (but it does not affect the results per se; but only the units used to report results in). For comparability in results between our five preference measures we standardized responses to z-scores also for risk taking, positive reciprocity, and trust for the regression analyses testing the hypotheses.
To measure positive reciprocity, we integrated the self-assessment question ("When someone does me a favor, I am willing to return it.") in the global preference module but did not integrate the hypothetical choice item in the data-collection. The respondents were asked whether they agreed with this expression on a 10 point Likert Scale. Finally, trust was also measured by a Likert Scale from 0 to 10, where a higher number is more agreement. The trust item we used was "People are basically honest". Note that this phrasing of the trust question is slightly different than the one we proposed to include in the data collection and that we included in the pre-analysis plan ("As long as I am not convinced otherwise, I assume that people have only the best intentions"). As the SOEP-IS questionnaire already had the former item included in another module and also three other trust related items, the survey management decided to include the shorter version rather than the version we proposed. As the data for the module using the trust question were collected in only a sub-sample of SOEP-IS, we have a smaller number of observations for trust than planned and a smaller number of observations than for the other preference measures.
The distribution for each of the five economic preference measures are presented in the Supplementary Information (see Figures S6-S10).

2D:4D measurements
Both left and right hand 2D:4D's of the participants were measured during the household surveys with the help of digital calipers. Our main reasons for choosing a direct measurement method, instead of using scanners or mobile applications, were because of confidentiality of personal information as well as mobility and time efficiency. Scanned images of the hands would contain fingerprints of the participants, which may be considered as delicate personal information. Any discomfort or suspicion of the participants due to using scanned images may also result in dropping out of the study in further waves.
Moreover, as the data was collected from a large number of households, digital calipers are easier to carry than flatbed scanners and they are also time and cost effective. While there are several mobile applications being developed for 2D:4D measurements, they are currently available only in demo versions. 263 interviewers were trained for the 2D:4D measurements in August, 2018. To minimize measurement errors, they were given a detailed hand measurement protocol that we prepared. The protocol instructed interviewers about i) Calibrating and using the digital calipers, ii) preparation for measurements (e.g. seating and hand positions, taking off the jewelery, iii) finger measurement, iv) second measurement if the 2D:4D ratio exceeds usual range of 0.8 and 1.1 significantly. This range was determined by the minimum and maximum 2D:4D ratios in previous studies (i.e. Lima de Miranda et al., 2018;Neyse et al., 2016;Neyse et al., 2020). The interviewers were asked to record both first and second measurements if they measured the finger lengths for the second time. The complete hand measurement protocol has been posted as an Appendix to the pre-analysis plan (but it was not posted at the same time as the pre-analysis plan, but after the data collection was completed). This protocol was prepared following a set of tests on different types of digital calipers, measurement methods and multiple measurers. To test the protocol, two research assistants in SOEP measured a number of hands using the calipers and the proposed protocol. These measures obtained by the research assistants were compared with digital measurements carried out by one of the authors of this study, LN. The measures obtained with the final protocol were almost identical to those obtained with digital scanners. The five primary hypotheses were tested in OLS regressions estimated with robust (Huber-White) standard errors. The survey measure of economic preferences was the dependent variable and the right hand 2D:4D and the gender of the participant were the independent variables. The hypothesis tests tested if the coefficient of the right hand 2D:4D was significant in the hypothesized direction. For each hypothesis we included all participants with data on right hand 2D:4D and gender who responded to the survey question used to construct the preference measure tested in the hypothesis. For altruism and negative reciprocity, where the preference measures were based on combining answers to more than one survey question (see Supplementary Information), we included all participants who responded to all the survey questions used to construct the preference measure (thus we did not include potential participants who only answered to one question for these measures).

Statistical analysis
We included all participants with a recorded right hand 2D:4D measurement and we did not exclude any 2D:4D observations in our main results. However, as explained in more detail in the beginning of the Results section we added two not pre-registered robustness tests to test if our results are sensitive to outliers due to mismeasurement or injured fingers. These robustness tests are referred to as "the corrected sample" and "the restricted sample" and these results are reported in Supplementary Information Tables   S2-S7, S11-S16, S20-S25.

Pre-registered exploratory analyses 1: Gender interactions
For the five primary hypotheses we tested if the association between right hand 2D:4D and the survey measures differed between men and women. These tests were done by adding an interaction between gender and the right hand 2D:4D in the five OLS regressions used to test the five primary hypotheses. We tested if this interaction coefficient was statistically significant, but without specifying a hypothesized direction of the interaction coefficient as these analyses were exploratory and would need confirmation in other studies to carry much weight. We also reported the statistical significance of the right hand 2D:4D for men and women separately as part of these analyses, but these tests are only relevant if the interaction coefficient is statistically significant.

Pre-registered exploratory analyses 2: Non-linear relationships
Some studies in the literature have reported non-monotonic findings between 2D:4D and economic preferences, by adding a squared 2D:4D term to a linear regression (see for example Brañas-Garza et al., 2013;Galizzi and Nieboer, 2015). As exploratory analyses we therefore added the squared right hand 2D:4D to the OLS regression equations used for testing the five primary hypotheses, to test for non-linear effects of 2D:4D. In these tests we assessed the significance of the squared right hand 2D:4D, but without specifying a hypothesized direction of the squared term as these analyses are exploratory.

Pre-registered robustness tests: Left hand 2D:4D and average of left and right hand 2D:4D
As a robustness test we also estimated all the analyses above using the left hand 2D:4D and the average of the left hand and the right hand 2D:4D instead of the right hand 2D:4D (in the existing literature the right hand 2D:4D, the left hand 2D:4D and the average of the two have been used). We included all participants with data on left hand 2D:4D in the analyses for left hand 2D:4D in that robustness test (and with data on gender and the preference measure tested in each hypothesis as above) and all participants with data on both left and right hand 2D:4D in that robustness test (and data on gender and the preference measure tested in each hypothesis as above).

Results
In this section, we report our results in accordance with our pre-registered pre-analysis plan. In the pre-analysis plan, we did not include any exclusion criteria for excluding outliers (we wrote that all observations would be included). But the instructions to the interviewers for measuring 2D:4D in SOEP included some steps to prevent outliers due to mismeasurement or injured fingers. The interviewers were told that the 2D:4D typically lies between 0.8 and 1.1, and that if the measured ratio significantly exceeded this range they were told to repeat the measurement. For some individuals there therefore exists two recorded measurements of the digit ratio (one first measurement and a second measurement due to a suspected first mismeasurement). As we did not write anything about replacing the first measurement with the second measurement in these cases in the pre-analysis plan, we report results based on the first measurement for all individuals as our main results below. We supplement these results with an additional not pre-registered robustness test where we replace the first measurement with the second measurement for all individuals where a second measurement is available in the data (a second measurement of the right/left hand digit ratio was available for 90/149 individuals in the data). We carry out this robustness test for the primary hypotheses, the pre-registered exploratory analyses and the pre-registered robustness tests reported below and these results are reported in Tables S2-S4, S11-S13, and S20-S22. We refer to this robustness test as "the corrected sample".
If the interviewee had a missing or severely injured second digit (2D) or fourth digit (4D), the interviewers were instructed not to measure the injured hand and continue with the other hand. But from comments added by interviewers in the data set, it is evident that in some cases they measured the injured hand and then added a comment about an injured finger. In our main results below, we include all the measured digit ratios even if there was a comment about an injured finger in the data set (as that is most consistent with our pre-analysis plan). But we have added an additional not-pre-registered robustness test where we have excluded digit ratios outside the range of 0.8-1.2 (this range corresponds to about +/-four STDs away from the mean in our data), as it is very unusual to observe values outside this range in previous studies (in the data set we observed 14/30 digit ratios outside this range for the right/left hand digit ratio). This will imply that digit ratios that are outliers due to injured fingers will be excluded from the analysis. We chose this approach instead of trying to identify every recorded measurement in the data set due to injured fingers as it is not always obvious from the interviewers' comments if fingers are severely injured or not. We carry out this robustness test for the primary hypotheses, the pre-registered exploratory analyses and the pre-registered robustness tests reported below and these results are reported in Tables S5-S7, S14-S16, and S23-S25. We refer to this robustness test as "the restricted sample". For our restricted sample mean 2D:4D analyses we generated the dependent variable, m2D:4D, by calculating the average of restricted right and restricted left hand 2D:4Ds.
When we below report our results for the pre-registered tests, we comment in the text if the findings differ importantly in the not pre-registered tests in "the corrected sample" or "the restricted sample" (and we refer to the results below as "the full sample"). Following the recommendations by a recent paper (Benjamin et al., 2018), we consider results with p-values below 0.005 to be statistically significant and results with p-values below 0.05 as suggestive evidence; and all statistical tests are based on two-sided tests (these p-value cut-offs were mentioned already in the pre-registration).

Descriptives
Descriptives of the dependent and independent variables are shown in Table 1 (and in   Table S1 for the corrected sample and the restricted sample), with results reported separately for men and women (the results for the preference variables are based on the n=3,482 individuals with at least one of the right or left hand 2D:4D data in the full sample). We also report the p-value of the gender difference of each variable based on an independent samples t-test, but these tests were not part of our pre-registered hypothesis tests. As conjectured in the 2D:4D literature, 2D:4D tends to be lower for men than for women. This difference is statistically significant for the left hand digit ratio and the mean of the left hand and right hand digit ratio, but not for the right hand digit ratio in the full sample. In the corrected sample and the restricted sample there is suggestive evidence (p<0.05) for a lower digit ratio for men also for the right hand digit ratio.
In line with previous literature on gender differences in economic preferences (Croson & Gneezy, 2009), men are on average more risk taking than women and less altruistic.
Using the same measures as we do, Falk et al. investigated the economic and social preferences in 76 countries and with almost 80,000 participants. They found systematic gender differences in positive reciprocity, negative reciprocity and trust, with women on average being more likely to positively reciprocate, less likely to negatively reciprocate and more trusting . We confirm that finding for negative reciprocity (p<0.001) and positive reciprocity (p<0.05) whereas trust do not vary statistically significantly between men and women in our data.

Primary Hypotheses
The primary hypothesis tests, robustness tests, and exploratory analyses reported below were pre-registered prior to starting the data collection. In Table 2 we present the results for our primary hypotheses in regressions with the economic preference measure as a function of 2D:4D and gender. Note that in the regressions we have standardized all the five preference measures to z-scores so that one unit in the dependent variable represents a standard deviation of the preference measure. We cannot reject the null hypothesis of no association between 2D:4D and economic preferences for any of the five preference measures; and the sign of the coefficient is in the opposite direction of the hypothesis for three of the five hypotheses. The female coefficients confirm the gender differences in preferences observed in the descriptives data.
In Figure 1 we graphically illustrate our results for the five primary hypotheses, by showing 95% and 99.5% confidence intervals of the estimated regression coefficients for the association between 2D:4D and economic preferences. To ease the interpretation of effects we report these results as standardized regression coefficients so that the results show how many standard deviation units economic preferences change for a one standard deviation change in 2D:4D. For risk taking, altruism, positive reciprocity and negative reciprocity the upper and lower limit of the 99.5% confidence interval is within about a +/-0.05 standard deviation change in economic preferences for a one standard deviation change in 2D:4D. The confidence intervals for trust are somewhat wider due to the lower sample size for trust (the trust question was only included in a sub-sample of SOEP-IS).
These results thus show precisely estimated null results, suggesting that 2D:4D is not associated with important effects on economic preferences. These results are reinforced by our high power to find significant associations. Based on the observed standard errors

Exploratory Analyses 1: Gender Interactions
We pre-registered two exploratory analyses. The first of these tests for a gender difference in the association between 2D:4D and economic preferences by adding an interaction between gender and 2D:4D in the regressions. In testing for the gender interaction coefficient we also report if 2D:4D is statistically significantly associated with economic preferences among men or women. The results, including the interaction effect tests, are reported in Table 3. The gender interaction is not statistically significant in any of the The suggestive evidence for the gender interaction is thus likely to be due to randomness. This is reinforced by that there is no statistically significant or suggestive evidence of this gender interaction in the corrected sample or the restricted sample. In the restricted sample there is instead suggestive evidence of a positive gender interaction for risk taking with a negative point estimate of the association between 2D:4D and risk taking for men and a positive point estimate for women. However, also in this case the point estimate is not statistically significant for either men or women (and there is no suggestive evidence of an association either) (see Table S6).  These results are reported in Table 4. The squared 2D:4D coefficient is not statistically significant in the five regression equations, with the exception of negative reciprocity. For negative reciprocity the squared 2D:4D term is statistically significant with a negative coefficient (and the 2D:4D coefficient has a positive coefficient). This result is also observed in the corrected sample, but not in the restricted sample suggesting that this result is due to outliers due to injured fingers (see Tables S4 and S7). In checking the data, there is one extreme right hand digit ratio of 2.175 in the full sample and in the corrected sample, but not in the restricted sample. According to the comments by the interviewers this is due to an injury of the 4D finger. If this outlier is removed the squared 2D:4D term is no longer statistically significant in the full sample (and there is no suggestive evidence either). We thus find no robust evidence of a non-linear relationship between 2D:4D and economic preferences. Without this extreme outlier the suggestive evidence for the 2D:4D gender interaction and negative reciprocity in Table 3 above also disappears . As pre-registered robustness tests we estimated the primary hypotheses and exploratory analyses also for the left hand 2D:4D and the average of the left hand and right hand 2D:4D.For the full sample there is no statistically significant or suggestive evidence of an association between 2D:4D and any of the economic preferences in any of the robustness tests for the primary hypotheses or the exploratory analyses (see Tables S8-S10 and S17-S19).

Exploratory Analyses 2: Non-Linear Relationships
We also carried out the analyses using the left hand 2D:4D and the average of the left hand and right and 2D:4D using the corrected sample and the restricted sample (see Tables S11-S16 and S20-S25). As noted above these additional robustness tests were not pre-registered. In these analyses there is no statistically significant or suggestive evidence in the robustness tests of the primary hypotheses or the exploratory tests of a gender interaction. But in three cases the squared 2D:4D coefficient is statistically significant, in the robustness tests of a non-linear relationship. For negative reciprocity the squared 2D:4D term for the average 2D:4D of both hands is statistically significant in the corrected sample with a negative coefficient (and the 2D:4D coefficient has a positive coefficient) (see Table S22). As above for right hand 2D:4D this result seem to be due to outliers as the squared term is not statistically significant in the restricted sample (see Table S25), and as above the result is not robust to removing the one extreme outlier due to an injured finger noted above (there is not even suggestive evidence for the squared term without this observation).
The other two cases with a statistically significant squared term is for risk taking in the restricted sample; here the squared term is statistically significant in both the regression for left hand 2D:4D and the regression for the average of the left hand and right hand 2D:4D (see Tables S16 and S25). In both these regressions the squared term has a negative coefficient (and the 2D:4D has a positive coefficient). This implies that risk taking initially increases with 2D:4D until the function peaks, and then it decreases.
For left hand 2D:4D the function peaks at 0.97 and for the average of the right hand and the left hand 2D:4D the function peaks at 0.98. The hypothesized sign of 2D:4D and risk taking is a negative sign, which implies that for a sizeable part of the 2D:4D distribution the direction of the association is in the opposite direction of this hypothesis in these models with the squared term (the lower part of the 2D:4D distribution until the function peaks). The results for these two regressions with the squared term are thus not fully consistent with the hypothesized negative association, and we think they are likely to be chance findings. These analyses are also part of the exploratory analyses, and would need confirmation in future confirmatory studies to provide substantive evidence of a non-linear association between 2D:4D and risk taking.

Discussion
Our null results suggest that variation in 2D:4D does not explain any important variation in economic preferences between individuals -at least as measured by the measures included here. The point estimates for the five measures of economic preferences are close to zero and are precisely estimated with 99.5% confidence intervals within about a +/-0.05 standard deviation change in economic preferences for a one standard deviation change in 2D:4D (with the exception of trust, which has somewhat wider confidence intervals due to the smaller sample size).
Why do we find null results while several smaller studies have found indications of 2D:4D correlating with economic preferences? This could be due to the "usual suspects" increasing the risk of false positive results; small sample sizes, publication bias and "researcher degrees of freedom" such as "forking". For example, in most previous papers there is room to try whether there is a statistically significant association between the studied behavior for the right hand, or for the left hand, or for the average of the two hands, or for men only, or for women only, or for the pooled sample of men and women, etc. Even in studies where researchers are testing specific hypotheses, there are typically some degrees of freedom in how these hypotheses are tested, and it is easy to convince oneself that the statistically significant result is the one that was always intended as the focus. (We do not think that most researchers are "p-hacking" in the sense that they are actively trying to find only statistically significant results somewhere and only communicate these to the world, but that "forking" is very natural and hard to avoid without a pre-analysis plan.) Our sample size was about 5 times larger than the previously largest study in this field, and the pre-registration of our hypothesis tests leaves little room for "researcher degrees of freedom" affecting our results.
Another interpretation of our findings is that testosterone exposure in utero is unimportant for economic preferences. Recent evidence by van Leeuwen et al. (2020) suggest that this may be the case -using direct measures of testosterone and estrogen from umbilical cord blood at birth they find no evidence for an association of these hormones and economic preferences (risk preferences, competitiveness, time preferences and social preferences) in a sample of 217 adults. In this study the authors also find no evidence for associations between 2D:4D and economic preferences, including risk preferences, positive reciprocity, negative reciprocity and trust, in a sample of 597 adults. Another possibility is that 2D:4D might not be a valid proxy for testosterone exposure in utero. The evidence of a link between 2D:4D and testosterone exposure, while coming from several strands of literature, is mainly indirect and the results are often inconsistent. First, it has been argued that 2D:4D is lower in men than in women. While men on average have a lower 2D:4D also in our study, the difference is small, and this result is not always found (for example Apicella et al. (2015) find no gender difference in a large sample of the Hadza population). Second, the most direct evidence comes from a study on the testosteroneto-estradiol ratio in amniotic fluid in a sample of 29 children (Lutchmaya et al., 2004).
They report a statistically significant negative correlation between this ratio and right hand 2D:4D, even after controlling for gender, while the correlation for the left hand is reported as insignificant. Similarly, another study finds a weak negative correlation between the testosterone levels in the mother's plasma and 2D: In sum, while there is some evidence in support of a link between 2D:4D and testosterone exposure in utero, the results do not provide strong evidence and it should be a research priority to establish if such a link actually exists based on a large sample preregistered study. We would also encourage pre-registered large sample replication studies on the association between 2D:4D and other traits previously linked to 2D:4D such as personality, sexual orientation and spatial ability, as well as similar studies on economic preferences using incentivized measures.

Open practices statement
The pre-analysis plan and hypotheses of this study were formally pre-registered in Open Science Framework prior to data collection (https://osf.io/5vpdn/  Supplementary text Figs. S1 to S10 Tables S1 to S25 SI References 1 of 38 Fig. S1. Standardized regression coefficients of R2D:4D for primary hypotheses based on corrected sample in Table S2. Standardized R2D:4D (y-axis) was used to generate the figure. Interval bars refer to 95% and 99.5% confidence intervals.  Table S5. Standardized R2D:4D (y-axis) was used to generate the figure. Interval bars refer to 95% and 99.5% confidence intervals.