๐ค AI Summary
Traditional scientific papers employ linear narratives that obscure the exploratory branches and failed attempts inherent in research, impeding AI agentsโ ability to understand, reproduce, and extend findings. To address this limitation, this work proposes the Agent-Native Research Artifact (Ara) protocolโa novel four-layer structured format that encapsulates scientific logic, executable code, exploration graphs, and evidence chains into a machine-executable package, serving as a functional replacement for conventional papers. The Ara ecosystem integrates a real-time research manager, compiler, and native review system, enabling automatic conversion between PDFs and code repositories alongside objective, automated peer review. Empirical evaluation on PaperBench and RE-Bench demonstrates substantial improvements: question-answering accuracy rises from 72.4% to 93.7%, reproduction success increases from 57.4% to 64.4%, and failed trajectories significantly accelerate progress on open-ended tasks.
๐ Abstract
Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details unwritten. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work. We introduce the Agent-Native Research Artifact (Ara), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs. Three mechanisms support the ecosystem: a Live Research Manager that captures decisions and dead ends during ordinary development; an Ara Compiler that translates legacy PDFs and repos into Aras; and an Ara-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste. On PaperBench and RE-Bench, Ara raises question-answering accuracy from 72.4% to 93.7% and reproduction success from 57.4% to 64.4%. On RE-Bench's five open-ended extension tasks, preserved failure traces in Ara accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities.