🤖 AI Summary
System-level software packages frequently fail to build due to evolving toolchains and architectural diversity, with repairs hindered by multilingual components, dependency constraints, and architecture-specific nuances. This work presents the first systematic analysis of real-world build failures, revealing that 72% stem from dependency and environment misconfigurations. To address this, we propose an evidence-preserving iterative repair framework that decouples evidence management from tool execution, integrating external reproducible build services, modular fault localization, a repair controller, and a knowledge-driven architecture adaptation mechanism. Evaluated on RISC-V, aarch64, and x86_64 platforms, our approach achieves repair success rates of 53.88%, 41.77%, and 46.99%, respectively—substantially outperforming agent-based baselines (20.55%) and direct LLM approaches (1.83%).
📝 Abstract
Frequent toolchain updates and growing ISA diversity have made system-level software package repair increasingly important. Diagnosing and repairing build failures remains challenging because failures involve heterogeneous evidence, dependency constraints, and architecture-specific build conventions. While recent LLM-based repair methods show promise for project-level source fixes, they struggle with system-level repair, where failures span multi-language artifacts such as build recipes, scripts, and source archives, and require iterative validation through external build services. In this paper, we first conduct a systematic empirical study of real-world system-level build failures. We find that 72% of failures stem from dependency and environment misconfigurations rather than isolated code defects, suggesting that effective repair must prioritize packaging logic and iterative feedback. Motivated by these insights, we propose EvidenT, an evidence-preserving repair framework that decouples iteration-aware evidence management from tool execution. EvidenT includes: (1) an external Build Service for reproducible execution and feedback; (2) an Evidence-Preserving Repair Controller that fuses repair history, knowledge context, and build artifacts; and (3) an automated Repair Orchestrator that invokes modular tools for failure localization and system-level repair in a closed-loop validation environment. We evaluate EvidenT on 219 real-world RISC-V package build failures. EvidenT repairs 118 packages (53.88%), outperforming state-of-the-art agentic baselines (20.55%) and direct LLM-based repair (1.83%). To assess architectural generality, we extend EvidenT to legacy ISAs by updating only ISA-specific knowledge context. Preliminary experiments achieve success rates of 41.77% on aarch64 and 46.99% on x86_64, demonstrating robustness across diverse hardware ecosystems.