Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results

πŸ“… 2026-04-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

197K/year
πŸ€– AI Summary
Can empirical results in social science be automatically reproduced using only the paper text and original data? This work proposes the first end-to-end agent-based reproduction system that, under strict information isolation, leverages large language models to extract structured methodological descriptions and execute deterministic analyses, enabling cell-level result comparison and root-cause tracing of discrepancies. Experimental evaluation on 48 human-validated reproducible papers demonstrates that most results can indeed be successfully reproduced; however, performance is highly dependent on model capabilities, agent architecture, and the clarity of methodological reporting in the papers. The findings reveal that reproduction failures stem from a dual source: limitations inherent to the agent system and ambiguities in the original method descriptions.

Technology Category

Application Category

πŸ“ Abstract
Recent work has used LLM agents to reproduce empirical social science results with access to both the data and code. We broaden this scope by asking: Can they reproduce results given only a paper's methods description and original data? We develop an agentic reproduction system that extracts structured methods descriptions from papers, runs reimplementations under strict information isolation -- agents never see the original code, results, or paper -- and enables deterministic, cell-level comparison of reproduced outputs to the original results. An error attribution step traces discrepancies through the system chain to identify root causes. Evaluating four agent scaffolds and four LLMs on 48 papers with human-verified reproducibility, we find that agents can largely recover published results, but performance varies substantially between models, scaffolds, and papers. Root cause analysis reveals that failures stem both from agent errors and from underspecification in the papers themselves.
Problem

Research questions and friction points this paper is trying to address.

reproducibility
LLM agents
social science
methods description
result replication
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic reproduction
LLM agents
method extraction
deterministic comparison
error attribution