🤖 AI Summary
Individual contributions in scientific teams remain difficult to identify, as conventional author ordering and self-reported statements suffer from bias and incomplete coverage. Method: Leveraging LaTeX source files from 1.6 million papers (1991–2023), we introduce a large-scale, code-level analysis of implicit labor division in scientific writing—distinguishing conceptual (e.g., Introduction, Discussion) from technical (e.g., Methods, Experiments) authorship. We innovatively treat author-defined macros as objective, fine-grained writing traces to quantify chapter-level contributions, bypassing reliance on positional heuristics or subjective declarations. Contribution/Results: Integrating author-chapter mapping, multi-source validation (CRediT statements, Overleaf logs, disciplinary norms), and statistical testing (Spearman’s ρ = 0.6, *p* < 0.05), our method achieves 0.87 accuracy. We release the first global, million-scale dataset of scientific writing contributions—revealing robust cross-disciplinary division-of-labor patterns and providing a scalable, empirical foundation for equitable author credit allocation and research evaluation reform.
📝 Abstract
Recognition of individual contributions is fundamental to the scientific reward system, yet coauthored papers obscure who did what. Traditional proxies-author order and career stage-reinforce biases, while contribution statements remain self-reported and limited to select journals. We construct the first large-scale dataset on writing contributions by analyzing author-specific macros in LaTeX files from 1.6 million papers (1991-2023) by 2 million scientists. Validation against self-reported statements (precision = 0.87), author order patterns, field-specific norms, and Overleaf records (Spearman's rho = 0.6, p<0.05) confirms the reliability of the created data. Using explicit section information, we reveal a hidden division of labor within scientific teams: some authors primarily contribute to conceptual sections (e.g., Introduction and Discussion), while others focus on technical sections (e.g., Methods and Experiments). These findings provide the first large-scale evidence of implicit labor division in scientific teams, challenging conventional authorship practices and informing institutional policies on credit allocation.