🤖 AI Summary
This work addresses the challenge of configuration failures in large-scale software environments, which often stem from complex errors and are inadequately handled by existing approaches due to their lack of fine-grained analysis of agent behaviors and effective repair mechanisms. To overcome this limitation, we propose EvoConfig, a novel framework that introduces a self-evolving multi-agent collaboration mechanism integrated with a fine-grained expert diagnosis module. This enables precise post-execution error identification and the generation of targeted repair strategies, further enhanced by dynamic priority adjustment to optimize the repair process. Evaluated on the EnvBench benchmark, EvoConfig achieves a success rate of 78.1%, outperforming the current state-of-the-art method, Repo2Run, by 7.1%, and demonstrates significant improvements in both error identification accuracy and the effectiveness of repair recommendations.
📝 Abstract
A reliable executable environment is the foundation for ensuring that large language models solve software engineering tasks. Due to the complex and tedious construction process, large-scale configuration is relatively inefficient. However, most methods always overlook fine-grained analysis of the actions performed by the agent, making it difficult to handle complex errors and resulting in configuration failures. To address this bottleneck, we propose EvoConfig, an efficient environment configuration framework that optimizes multi-agent collaboration to build correct runtime environments. EvoConfig features an expert diagnosis module for fine-grained post-execution analysis, and a self-evolving mechanism that lets expert agents self-feedback and dynamically adjust error-fixing priorities in real time. Empirically, EvoConfig matches the previous state-of-the-art Repo2Run on Repo2Run's 420 repositories, while delivering clear gains on harder cases: on the more challenging Envbench, EvoConfig achieves a 78.1% success rate, outperforming Repo2Run by 7.1%. Beyond end-to-end success, EvoConfig also demonstrates stronger debugging competence, achieving higher accuracy in error identification and producing more effective repair recommendations than existing methods.