What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This work addresses the challenge of reproducing academic papers through automated code generation, which is often hindered by the absence of tacit knowledge—such as implementation nuances and debugging insights. The study presents the first systematic formalization of three types of tacit knowledge: relational, embodied, and collective. To recover this knowledge, the authors propose a graph-based agent framework that operates through a three-stage mechanism: relation-aware aggregation, execution-feedback refinement, and graph-level knowledge induction. They also introduce an expanded version of ReplicateBench, a large-scale evaluation benchmark encompassing three domains, ten tasks, and forty papers. Experimental results demonstrate that the generated code achieves an average performance gap of only 10.04% compared to official implementations, representing a 24.68% improvement over the strongest baseline.

Technology Category

Application Category

📝 Abstract
Automated paper reproduction -- generating executable code from academic papers -- is bottlenecked not by information retrieval but by the tacit knowledge that papers inevitably leave implicit. We formalize this challenge as the progressive recovery of three types of tacit knowledge -- relational, somatic, and collective -- and propose \method, a graph-based agent framework with a dedicated mechanism for each: node-level relation-aware aggregation recovers relational knowledge by analyzing implementation-unit-level reuse and adaptation relationships between the target paper and its citation neighbors; execution-feedback refinement recovers somatic knowledge through iterative debugging driven by runtime signals; and graph-level knowledge induction distills collective knowledge from clusters of papers sharing similar implementations. On an extended ReproduceBench spanning 3 domains, 10 tasks, and 40 recent papers, \method{} achieves an average performance gap of 10.04\% against official implementations, improving over the strongest baseline by 24.68\%. The code will be publicly released upon acceptance; the repository link will be provided in the final version.
Problem

Research questions and friction points this paper is trying to address.

tacit knowledge
automated paper reproduction
relational knowledge
somatic knowledge
collective knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

tacit knowledge recovery
graph-based agent framework
automated paper reproduction
execution-feedback refinement
knowledge induction
🔎 Similar Papers
2024-06-08Annual Meeting of the Association for Computational LinguisticsCitations: 2