SetupX: Can LLM Agents Learn from Past Failures in Functionality-Correct Code Repository Setup?

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM agents struggle with diverse failure modes in code repository environment configuration—such as dependency conflicts and missing toolchains—and lack both cross-repository experience transfer and mechanisms to safely handle irreversible operations. This work proposes an experience-driven configuration framework that enables cross-repository knowledge transfer through self-evolving, dual-modality experience representation units (XPUs), supports safe rollback and speculative execution via a LIFO Docker snapshot stack, and enhances configuration reliability by decoupling evidence collection from final judgment through a prosecutor-judge validation protocol. Evaluated on a custom benchmark, the approach achieves a 92% pass rate, substantially outperforming the strongest baseline by 19%, and demonstrates exceptional performance in complex, multi-repository, multi-container scenarios.
📝 Abstract
Functionality-correct repository setup aims to configure execution environments (e.g., dependencies, build scripts) to successfully execute a repository's documented features. It presents significant challenges due to diverse, repository-specific failures, including dependency incompatibilities, missing toolchains, incomplete installations, and verification-strategy mismatches. Existing LLM agents struggle to robustly resolve these issues, specifically failing to support (1) cross-repository experience transfer, (2) multi-step trial-and-repair under non-invertible state changes, and (3) robust verification of setup outcomes to distinguish setup-induced failures from repository bugs. To address this, we introduce SetupX, an experiential learning-based setup framework. First, we construct a Self-Evolving Experience Representation (XPU), a dual-modality knowledge unit encoding setup signals, textual guidance, executable actions to dynamically transfer verified environment fixes to unseen repositories. Second, we employ Experience-Augmented Speculative Execution backed by a LIFO Docker snapshot stack, enabling the agent to proactively trial fixes and safely roll back to known-good states. Third, we introduce a Prosecutor-Judge Verification Protocol that separates evidence collection from final judgment, enabling more reliable setup verification beyond superficial build-time metrics. Evaluation results on carefully-crafted benchmarks show SetupX achieves highest performance (e.g., 92% pass rate) and outperforms the strongest baseline by over 19%. Crucially, SetupX excels in complex multi-repository setup requiring coordinating multiple interconnected services across different containers. The code repository is available at https://github.com/OpenDataBox/SetupX.
Problem

Research questions and friction points this paper is trying to address.

repository setup
LLM agents
environment configuration
failure learning
setup verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

experiential learning
cross-repository transfer
speculative execution
Docker snapshot stack
verification protocol
🔎 Similar Papers
No similar papers found.