🤖 AI Summary
This work addresses the challenge of maintaining effective context in repository-level iterative code generation, where existing approaches often fail to retain previously fixed errors and impose high cognitive overhead. To overcome these limitations, we propose CodeMEM, a dynamic memory management system guided by abstract syntax trees (ASTs). CodeMEM introduces two core components—code context memory and code session memory—that jointly enable code-centric, dynamic modeling of interaction history, facilitating precise context updates and forgetting detection. By moving beyond conventional natural language–centric memory paradigms, CodeMEM achieves state-of-the-art performance on CodeIF-Bench and CoderEval, improving instruction-following accuracy by 12.2% in the current turn and session-level performance by 11.5%, while reducing interaction rounds by 2–3, all with efficient inference and low token consumption.
📝 Abstract
Large language models (LLMs) substantially enhance developer productivity in repository-level code generation through interactive collaboration. However, as interactions progress, repository context must be continuously preserved and updated to integrate newly validated information. Meanwhile, the expanding session history increases cognitive burden, often leading to forgetting and the reintroduction of previously resolved errors. Existing memory management approaches show promise but remain limited by natural language-centric representations. To overcome these limitations, we propose CodeMEM, an AST-guided dynamic memory management system tailored for repository-level iterative code generation. Specifically, CodeMEM introduces the Code Context Memory component that dynamically maintains and updates repository context through AST-guided LLM operations, along with the Code Session Memory that constructs a code-centric representation of interaction history and explicitly detects and mitigates forgetting through AST-based analysis. Experimental results on the instruction-following benchmark CodeIF-Bench and the code generation benchmark CoderEval demonstrate that CodeMEM achieves state-of-the-art performance, improving instruction following by 12.2% for the current turn and 11.5% for the session level, and reducing interaction rounds by 2-3, while maintaining competitive inference latency and token efficiency.