🤖 AI Summary
This work addresses the limitations of traditional LLM-based code agents, which treat codebases as unstructured text and rely on brittle string matching that fails under formatting drift or ambiguous patterns. The authors propose modeling the codebase as a structured action space, enabling agents to operate on named abstract syntax tree (AST) entities: readCode retrieves complete syntactic units, while editCode applies semantically meaningful transformations validated by the grammar. This approach replaces ad hoc text-editing with structured actions, substantially improving robustness and accuracy. Evaluated on SWE-Bench Verified, it achieves 1.2–5.0% higher Pass@1 accuracy and reduces token consumption by 12–38%; with GPT-5-nano, accuracy improves by 20.8% and empty-patch rate drops from 46.6% to 7.2%. On CodeAssistBench, accuracy gains range from 0.8% to 4.4%, with cost reductions up to 33%.
📝 Abstract
LLM-based code agents treat repositories as unstructured text, applying edits through brittle string matching that frequently fails due to formatting drift or ambiguous patterns. We propose reframing the codebase as a structured action space where agents operate on named AST entities rather than text spans. Our framework, CODESTRUCT, provides readCode for retrieving complete syntactic units and editCode for applying syntax-validated transformations to semantic program elements. Evaluated on SWE-Bench Verified across six LLMs, CODESTRUCT improves Pass@1 accuracy by 1.2-5.0% while reducing token consumption by 12-38% for most models. Models that frequently fail to produce valid patches under text-based interfaces benefit most: GPT-5-nano improves by 20.8% as empty-patch failures drop from 46.6% to 7.2%. On CodeAssistBench, we observe consistent accuracy gains (+0.8-4.4%) with cost reductions up to 33%. Our results show that structure-aware interfaces offer a more reliable foundation for code agents.