🤖 AI Summary
Large language model (LLM)-based agents struggle to faithfully reproduce AI research due to insufficient background knowledge, the inability of standard RAG to capture implicit technical details in papers, and the lack of multi-granular, executable knowledge representations—leading to non-executable code. Method: This paper proposes a modular, pluggable executable knowledge graph (xKG), which integrates technical insights, code snippets, and domain-specific knowledge from scientific literature. xKG enables fine-grained retrieval and cross-task reuse, and is constructed end-to-end via joint information extraction, structured knowledge modeling, and RAG enhancement. Contribution/Results: Integrated with three mainstream agent frameworks and two LLM families on the PaperBench benchmark, xKG achieves up to a 10.9% absolute improvement in execution success rate, demonstrating its effectiveness and generality in enhancing reproducibility, deepening technical understanding, and ensuring code executability.
📝 Abstract
Replicating AI research is a crucial yet challenging task for large language model (LLM) agents. Existing approaches often struggle to generate executable code, primarily due to insufficient background knowledge and the limitations of retrieval-augmented generation (RAG) methods, which fail to capture latent technical details hidden in referenced papers. Furthermore, previous approaches tend to overlook valuable implementation-level code signals and lack structured knowledge representations that support multi-granular retrieval and reuse. To overcome these challenges, we propose Executable Knowledge Graphs (xKG), a modular and pluggable knowledge base that automatically integrates technical insights, code snippets, and domain-specific knowledge extracted from scientific literature. When integrated into three agent frameworks with two different LLMs, xKG shows substantial performance gains (10.9% with o3-mini) on PaperBench, demonstrating its effectiveness as a general and extensible solution for automated AI research replication. Code will released at https://github.com/zjunlp/xKG.