Published in Nature June 22 2022
by Matthew B. Ross, Britta M. Glennon, Raviv Murciano-Goroff, Enrico G. Berkes, Bruce A. Weinberg & Julia I. Lane
There is a well-documented gap between the observed number of works produced by women and by men in science, with clear consequences for the retention and promotion of women1. The gap might be a result of productivity differences2,3,4,5, or it might be owing to women’s contributions not being acknowledged6,7. Here we find that at least part of this gap is the result of unacknowledged contributions: women in research teams are significantly less likely than men to be credited with authorship. The findings are consistent across three very different sources of data. Analysis of the first source—large-scale administrative data on research teams, team scientific output and attribution of credit—show that women are significantly less likely to be named on a given article or patent produced by their team relative to their male peers. The gender gap in attribution is present across most scientific fields and almost all career stages. The second source—an extensive survey of authors—similarly shows that women’s scientific contributions are systematically less likely to be recognized. The third source—qualitative responses—suggests that the reason that women are less likely to be credited is because their work is often not known, is not appreciated or is ignored. At least some of the observed gender gap in scientific output may be owing not to differences in scientific contribution, but rather to differences in attribution.
Gender differences in observed scientific output are well-documented: women both publish and patent less than men1. The causes of these differences are not well understood. Analysis using individual data has suggested that women are less productive because they work in less welcoming work environments2, have greater family responsibilities3, have different positions in the laboratory4 or differ in the type of supervision they are provided5. Recent work has suggested that women are not less productive, but rather that their work is undervalued8. The analysis in this Article uses new data on research teams to suggest that women are accorded less credit than men: they are systematically less likely to be named as authors on articles and patents.
The possibility that women receive less recognition for their scientific contributions is not hypothetical: the canonical example is that of Rosalind Franklin. Franklin’s pivotal contribution to the discovery of the structure of DNA initially went unrecognized6, and it was not until long after she died that the scientific community became aware that she was wrongfully denied authorship on the original Crick and Watson paper. Indeed, her contribution was apparently only recognized because Watson’s account of the discovery was so incorrect9 and stimulated a reconstruction of events by Franklin’s friends10. More recently, Walter Isaacson recounts Jennifer Doudna’s concern that she and Emmanuelle Charpentier were being relegated to “minor players” in the history and commercial use of CRISPR-Cas97. The open questions, of course, are how many women’s contributions have been missed in similar but less high-profile circumstances, and how many women have been discouraged from pursuing careers in science as a result11.
Finding ‘what isn’t there’ from ‘what is there’ is a fundamental problem in statistics, and has been used to address such vastly different questions as calculating the return on investment of mutual funds (after accounting for funds that no longer exist) or the optimal placement of armour on aeroplanes in the Second World War12 (after accounting for those that did not return). The problem of selecting on the dependent variable is also prevalent in the social sciences; for example, in only observing the labour supply of people who participate in the labour market13 or studying the drivers of economic development by selecting a few successful industrializing countries14.
The first steps in identifying the missing data in these two examples are to describe the population from which the sample of observations is drawn and then to document the degree of missingness. Subsequent steps then characterize the sources of the missingness. The large-scale bibliometrics databases used to study scientific output consist only of named authors or inventors (not unnamed contributors), and cannot be used to find who is not named; carefully curated case studies are too small to generalize15. The unique data on research teams used in this paper are, by contrast, fit for the purpose: they consist of information on 9,778 teams over a four-year period: the 128,859 individuals working in those teams, matched to 39,426 journal articles and 7,675 patents produced by those teams (Methods, ‘Construction of administrative data’). Because the data include information about the positions held by each individual on each team as well as their gender, it is possible to calculate for each individual whether they did or did not receive credit on a given article and to calculate differences by gender.
The evidence generated from the analysis described in this paper suggests that Rosalind Franklin is far from unique in not receiving credit for her work. If credit is defined simply as ever being named an author, women account for only 34.85% of the authors on a team, even though they make up just under half of the workforce (48.25%; Extended Data Table 2). When credit is defined as the likelihood of being listed as an author on a given document (relative to the mean) produced by a research team, there is a 13.24% gap for articles and a 58.40% gap for patents in the likelihood that women are named on any given article or patent produced by their team (Extended Data Table 4, column 5). The chances of women receiving credit on an article decrease by 4.78% relative to the baseline rate of 3.18% (P < 0.0001; two-sided t-test; test value = −3.8, effect size = −0.0015 percentage points (pp)) for each 1 log point increase in citations (Extended Data Table 7).
The results are confirmed by appealing to a completely different source of quantitative data—a survey of 2,660 scientists regarding the allocation of credit (Methods, ‘Survey design and collection’ and Supplementary Information, part 3). Exclusion from authorship is common and differs significantly by gender: 42.95% of women and 37.81% of men reported that they had been excluded from authorship (P = 0.0151; two-sided t-test; test value = −2.4327, effect size = −0.0514), and significantly more women (48.97%) than men (39.13%) report that others underestimated their contribution (P = 0.0036; two-sided t-test; test value = −2.9218, effect size = −0.0984).
Qualitative analysis—open-ended narrative statements by survey respondents as well as personal interviews with consenting authors (approach detailed in Methods, ‘Survey design and collection’ and ‘Qualitative evidence’ and Supplementary Information, part 3)—was also consistent. Authors noted that the rules of credit allocation were frequently unclear and often determined by senior investigators. A complex mix of factors, particularly field, rank, culture and gender, was identified. However, an overarching theme was that the rules governing scientific contributions were often not codified, not understood by all members of the research team, or simply ignored. The necessary level of work required for authorship is often not clear to everyone participating on research teams, and the level of work deemed necessary to receive attribution can vary on the basis of the idiosyncratic personal preferences and a team member’s relationship with the principal investigator (PI). Thus, women and other historically marginalized groups must often put in significantly more effort in order for their scientific contributions to be recognized.
Our analyses on administrative, survey, and qualitative data suggest that even 70 years later, the same factors that led to the denial of Rosalind Franklin’s authorship of the pivotal work on the structure of DNA are still at work. At least some of the observed gender gap in scientific output may not be owing to differences in scientific contribution, but to differences in attribution within research teams.
Attribution and administrative data
Unpacking the structure of research teams to understand whose work is not recognized requires identifying each individual on each research team, characterizing their position by their job title, and then determining whether or not they are named on the articles and patents produced by the research team. Administrative data can be used to provide highly granular information about who works on which research project because records in human resources both document every payment that is made during each pay period from each grant and provide information on each employee’s job title. Currently, 118 campuses from 36 participating universities provide their deidentified data to the Institute for Research on Innovation and Science at the University of Michigan, which processes and standardizes the information as analytical files16. The earliest year for which data were provided by a participating institution was 2000 and the latest was 2019, and the data include information on payments of wages from individual grants to all people employed by each grant, including information on the job title for which a person is paid on a particular grant (Methods, ‘Construction of administrative data’).
Teams were constructed around a central PI, their associated grants, and individuals employed on those grants from 2013–2016. The scientific field of each team is identified by using the title of all associated grants and comparing the grants with a pool of text that describes each scientific field using a ‘wiki-labelling’ approach17,18,19. Scientific documents were linked to a team if the article or patent acknowledged one of the team’s grants and/or any member of the team was listed as an author on that article or patent (further details in Methods, ‘Construction of administrative data’).
Attribution can be measured in many ways using these data. Three measures are constructed for the purposes of this paper: (1) the rate at which individuals are ever named as an author on any scientific document: the ‘ever-author’ rate, (2) the rate at which individuals are named as an author on a given scientific document produced by their team—the ‘attribution’ rate, and (3) the rate at which individuals are named to any given high-impact document—the ‘high-impact attribution’ rate (Methods, ‘Analytical sample’).
The first and simplest measure is the ever-author rate, which characterizes an individual as an author if he or she was ever named as an author or an inventor during the analysis period. As shown in Table 1, 16.97% of individuals are classified as authors using this measure, but the probability that men are ever named is 21.17% whereas the probability for women is 12.15%. Table 1 also shows that there are two reasons for this gap: the junior positions of women in research teams, and under-representation in attribution given their position. First, women are less likely to be in the senior positions that are associated with ever being named an author, 'ever authorship'. The highest ever authorship rate (45.70%) is for faculty members, yet only 11.30% of women (versus 19.72% of men) in the sample are faculty members. Conversely, the ‘ever authorship’ rate for research staff is 8.63%, yet 47.81% of women are research staff, compared with 28.73% of men. Second, holding the distribution of positions constant (at the grand means), women are 4.82% less likely to ever be named as authors. In the case of graduate students, for example, 14.97% of women are ever named as an author on a document compared with 21.37% of men. The consequences of such disparities on the retention of senior women in and the attraction of young women to scientific careers are unlikely to be positive.
Table 1 Gender differences in position and ‘ever authorship’
Full size table
Although illustrative, the ever-author rate does not fully capture differential attribution. In our motivating example, Franklin could have been named as an author on some articles or patents emanating from the research team other than the DNA paper with Crick and Watson. The second authorship measure is the attribution rate, which represents the likelihood that a woman receives credit on a given scientific document produced by her research team.
The empirical implementation of what is a relatively straightforward conceptual framework is more difficult, but the data are rich enough to allow such calculations (see Methods, ‘Analytical sample’ for details). The denominator—the set of ‘potential authorships’—was created by associating all members of each team who were employed one year before the publication or application date to all associated articles or patents emanating from that team during the analysis period. Since some individuals, such as research staff, are on multiple teams, they are proportionately allocated across teams using a set of analytical weights (Methods, ‘Analytical sample’). The numerator—attribution—was defined as ‘actual authorships’ on those articles and patents. Thus, the attribution rate is the ratio of actual authorships to potential authorships. The overall attribution rate for any team member on either a patent or article is 3.2%. On average across all job titles and fields, women have a 2.12% probability of being named on any scientific document, whereas men are twice as likely to be named (4.23%) (P = 0.0000; two-sided t-test; test value = 19.5823, effect size = 2.11%; Extended Data Tables 2 and 3).
The data are rich enough to examine whether the observed gender gap simply reflects gender differences in organizational position rather than attribution. We find that women in each position are systematically less likely than men to be named an author on any given article or patent for any given position that they occupy in the organization.
Figure 1 (and Supplementary Fig. 5) makes use of information in the data about each individual’s position in the organization—faculty, postdoc, graduate student, undergraduate student or research staff—as well as the research team’s field. Women occupy more junior career positions than men. The proportion of women in each position declines as the seniority of the position increases (Fig. 1, left). At the high extreme, 34.82% of faculty members are women; at the lower extreme, 60.81% of research staff are women.