🤖 AI Summary
Large language models (LLMs) exhibit low precision and poor interpretability when repairing real-world software defects. Method: This paper proposes a structured, iterative patch generation framework grounded in the ReAct reasoning paradigm, decomposing repair into three closed-loop phases—diagnosis, modification, and test-based validation. It replaces RAG and embedding-based retrieval with lightweight heuristic rules and integration of developer tooling (e.g., Git interfaces, CI/CD test suites) to guide LLMs toward precise, localized code changes. Contribution/Results: The framework enables a modular, interpretable, and empirically verifiable debugging workflow. Evaluated on SWE-bench Lite, it significantly improves patch correctness and human comprehensibility. Crucially, it is the first work to empirically validate the feasibility and robustness of end-to-end, ReAct-driven engineering-grade repair in authentic Git repository environments.
📝 Abstract
Large Language Models (LLMs) have shown strong capabilities in code generation and comprehension, yet their application to complex software engineering tasks often suffers from low precision and limited interpretability. We present Repeton, a fully open-source framework that leverages LLMs for precise and automated code manipulation in real-world Git repositories. Rather than generating holistic fixes, Repeton operates through a structured patch-and-test pipeline: it iteratively diagnoses issues, proposes code changes, and validates each patch through automated testing. This stepwise process is guided by lightweight heuristics and development tools, avoiding reliance on embedding-based retrieval systems. Evaluated on the SWE-bench Lite benchmark, our method shows good performance compared to RAG-based methods in both patch validity and interpretability. By decomposing software engineering tasks into modular, verifiable stages, Repeton provides a practical path toward scalable and transparent autonomous debugging.