Hidden Division of Labor in Scientific Teams Revealed Through 1.6 Million LaTeX Files

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Individual contributions in scientific teams remain difficult to identify, as conventional author ordering and self-reported statements suffer from bias and incomplete coverage. Method: Leveraging LaTeX source files from 1.6 million papers (1991–2023), we introduce a large-scale, code-level analysis of implicit labor division in scientific writing—distinguishing conceptual (e.g., Introduction, Discussion) from technical (e.g., Methods, Experiments) authorship. We innovatively treat author-defined macros as objective, fine-grained writing traces to quantify chapter-level contributions, bypassing reliance on positional heuristics or subjective declarations. Contribution/Results: Integrating author-chapter mapping, multi-source validation (CRediT statements, Overleaf logs, disciplinary norms), and statistical testing (Spearman’s ρ = 0.6, *p* < 0.05), our method achieves 0.87 accuracy. We release the first global, million-scale dataset of scientific writing contributions—revealing robust cross-disciplinary division-of-labor patterns and providing a scalable, empirical foundation for equitable author credit allocation and research evaluation reform.

Technology Category

Application Category

📝 Abstract

Recognition of individual contributions is fundamental to the scientific reward system, yet coauthored papers obscure who did what. Traditional proxies-author order and career stage-reinforce biases, while contribution statements remain self-reported and limited to select journals. We construct the first large-scale dataset on writing contributions by analyzing author-specific macros in LaTeX files from 1.6 million papers (1991-2023) by 2 million scientists. Validation against self-reported statements (precision = 0.87), author order patterns, field-specific norms, and Overleaf records (Spearman's rho = 0.6, p<0.05) confirms the reliability of the created data. Using explicit section information, we reveal a hidden division of labor within scientific teams: some authors primarily contribute to conceptual sections (e.g., Introduction and Discussion), while others focus on technical sections (e.g., Methods and Experiments). These findings provide the first large-scale evidence of implicit labor division in scientific teams, challenging conventional authorship practices and informing institutional policies on credit allocation.

Problem

Research questions and friction points this paper is trying to address.

Identify individual contributions in coauthored papers

Analyze LaTeX files to reveal labor division

Challenge traditional authorship credit allocation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analysis of LaTeX macros

Large-scale dataset creation

Revealing labor division

🔎 Similar Papers

The dynamics of leadership and success in software development teams