Cascaded Code Editing: Large-Small Model Collaboration for Effective and Efficient Code Editing

πŸ“… 2026-04-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

201K/year
πŸ€– AI Summary
This work addresses the trade-off between accuracy and efficiency in code editing with language models: large models suffer from low inference efficiency due to excessive token generation, while small models struggle with long contexts and cross-file dependencies, limiting their accuracy. To reconcile these limitations, the authors propose a cascaded architecture that synergistically combines large and small models. In this framework, a large model first produces a concise edit sketch capturing high-level intent, which is then precisely applied to the original code by a small model. This approach drastically reduces redundant token generation by the large model, significantly improving inference efficiency without compromising editing accuracy. Moreover, it enhances the small model’s capability to perform accurate edits even in complex, context-rich scenarios.

Technology Category

Application Category

πŸ“ Abstract
Code editing constitutes a fundamental practice in software development, wherein developers modify existing codebases according to natural language requirements. Accurate code editing necessitates a comprehensive understanding of both the existing codebase and the modification requirements. Although large language models (LLMs) have demonstrated promising performance in code editing tasks, they suffer from substantial inefficiency by generating entire modified files that largely consist of unchanged code. While smaller models could potentially address this inefficiency, they typically lack the capacity to effectively comprehend long code contexts required for accurate editing. To ensure both effectiveness and efficiency, we propose to decompose code editing into a two-stage cascade: \textbf{edit sketch generation}, wherein a large model first produces concise sketches representing the requisite modifications (the more challenging phase), and \textbf{edit sketch application}, wherein a smaller model integrates these sketches into the original code to produce the final output edited code (the simpler phase). This cascaded design reduces the number of tokens generated by the large model, as the majority of the output is handled by the smaller, more efficient model, thereby enhancing overall efficiency. However, the effectiveness of this approach is constrained by current small models' limited capabilities in handling long-context scenarios and cross-file dependencies, which are essential for accurate sketch application in real-world codebases. To address these limitations and enhance smaller models' sketch application capabilities, ...
Problem

Research questions and friction points this paper is trying to address.

code editing
large language models
small models
long-context understanding
cross-file dependencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

cascaded code editing
large-small model collaboration
edit sketch
code generation efficiency
long-context modeling