What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the challenge of reproducing academic papers through automated code generation, which is often hindered by the absence of tacit knowledge—such as implementation nuances and debugging insights. The study presents the first systematic formalization of three types of tacit knowledge: relational, embodied, and collective. To recover this knowledge, the authors propose a graph-based agent framework that operates through a three-stage mechanism: relation-aware aggregation, execution-feedback refinement, and graph-level knowledge induction. They also introduce an expanded version of ReplicateBench, a large-scale evaluation benchmark encompassing three domains, ten tasks, and forty papers. Experimental results demonstrate that the generated code achieves an average performance gap of only 10.04% compared to official implementations, representing a 24.68% improvement over the strongest baseline.

Technology Category

Application Category

📝 Abstract

Automated paper reproduction -- generating executable code from academic papers -- is bottlenecked not by information retrieval but by the tacit knowledge that papers inevitably leave implicit. We formalize this challenge as the progressive recovery of three types of tacit knowledge -- relational, somatic, and collective -- and propose \method, a graph-based agent framework with a dedicated mechanism for each: node-level relation-aware aggregation recovers relational knowledge by analyzing implementation-unit-level reuse and adaptation relationships between the target paper and its citation neighbors; execution-feedback refinement recovers somatic knowledge through iterative debugging driven by runtime signals; and graph-level knowledge induction distills collective knowledge from clusters of papers sharing similar implementations. On an extended ReproduceBench spanning 3 domains, 10 tasks, and 40 recent papers, \method{} achieves an average performance gap of 10.04\% against official implementations, improving over the strongest baseline by 24.68\%. The code will be publicly released upon acceptance; the repository link will be provided in the final version.

Problem

Research questions and friction points this paper is trying to address.

tacit knowledge

automated paper reproduction

relational knowledge

somatic knowledge

collective knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

tacit knowledge recovery

graph-based agent framework

automated paper reproduction