What Papers Don't Tell You: Recovering Tacit Knowledge for Automated Paper Reproduction

๐Ÿ“… 2026-03-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenge of reproducing academic papers through automated code generation, which is often hindered by the absence of tacit knowledgeโ€”such as implementation nuances and debugging insights. The study presents the first systematic formalization of three types of tacit knowledge: relational, embodied, and collective. To recover this knowledge, the authors propose a graph-based agent framework that operates through a three-stage mechanism: relation-aware aggregation, execution-feedback refinement, and graph-level knowledge induction. They also introduce an expanded version of ReplicateBench, a large-scale evaluation benchmark encompassing three domains, ten tasks, and forty papers. Experimental results demonstrate that the generated code achieves an average performance gap of only 10.04% compared to official implementations, representing a 24.68% improvement over the strongest baseline.

Technology Category

Application Category

๐Ÿ“ Abstract
Automated paper reproduction -- generating executable code from academic papers -- is bottlenecked not by information retrieval but by the tacit knowledge that papers inevitably leave implicit. We formalize this challenge as the progressive recovery of three types of tacit knowledge -- relational, somatic, and collective -- and propose \method, a graph-based agent framework with a dedicated mechanism for each: node-level relation-aware aggregation recovers relational knowledge by analyzing implementation-unit-level reuse and adaptation relationships between the target paper and its citation neighbors; execution-feedback refinement recovers somatic knowledge through iterative debugging driven by runtime signals; and graph-level knowledge induction distills collective knowledge from clusters of papers sharing similar implementations. On an extended ReproduceBench spanning 3 domains, 10 tasks, and 40 recent papers, \method{} achieves an average performance gap of 10.04\% against official implementations, improving over the strongest baseline by 24.68\%. The code will be publicly released upon acceptance; the repository link will be provided in the final version.
Problem

Research questions and friction points this paper is trying to address.

tacit knowledge
automated paper reproduction
relational knowledge
somatic knowledge
collective knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

tacit knowledge recovery
graph-based agent framework
automated paper reproduction
execution-feedback refinement
knowledge induction
๐Ÿ”Ž Similar Papers
No similar papers found.
L
Lehui Li
School of Software, Shandong University
R
Ruining Wang
School of Software, Shandong University
H
Haochen Song
School of Software, Shandong University
Y
Yaoxin Mao
Beijing Institute of Technology
Tong Zhang
Tong Zhang
zhejiang university
Ai SecurityVideo Understanding
Y
Yuyao Wang
Dept. of Math & Statistics, Boston University
J
Jiayi Fan
School of Software, Shandong University
Y
Yitong Zhang
College of AI, Tsinghua University
J
Jieping Ye
Alibaba Group
Chengqi Zhang
Chengqi Zhang
Chair Professor of Artificial Intelligence
Data Mining
Yongshun Gong
Yongshun Gong
Professor at the School of Software, Shandong University, China
Urban ComputingSpatio-temporal Data MiningSpatio-Temporal AIPattern Mining