🤖 AI Summary
Scientific credit assignment has long suffered from oversimplified assumptions about author ordering and subjective biases in self-reported contributions. This study introduces a behavior-driven paradigm for quantifying individual contributions, constructing author-specific writing behavioral signatures from LaTeX macros embedded in manuscripts. Applying large-scale text-provenance analysis to 730,000 arXiv papers, we empirically uncover systematic disciplinary writing分工: first authors predominantly write Methods and Results sections, while last authors focus on Introduction and Discussion. Rigorous cross-validation—using Overleaf editing logs, self-reported contribution data, and domain-specific authorship conventions—confirms high-fidelity inference of writing activity across >500,000 scientists. Our approach transcends conventional proxy metrics (e.g., position-based heuristics), delivering an interpretable, scalable, and empirically grounded framework for scholarly credit allocation.
📝 Abstract
The recognition of individual contributions is central to the scientific reward system, yet coauthored papers often obscure who did what. Traditional proxies like author order assume a simplistic decline in contribution, while emerging practices such as self-reported roles are biased and limited in scope. We introduce a large-scale, behavior-based approach to identifying individual contributions in scientific papers. Using author-specific LaTeX macros as writing signatures, we analyze over 730,000 arXiv papers (1991-2023), covering over half a million scientists. Validated against self-reports, author order, disciplinary norms, and Overleaf records, our method reliably infers author-level writing activity. Section-level traces reveal a hidden division of labor: first authors focus on technical sections (e.g., Methods, Results), while last authors primarily contribute to conceptual sections (e.g., Introduction, Discussion). Our findings offer empirical evidence of labor specialization at scale and new tools to improve credit allocation in collaborative research.