🤖 AI Summary
This work addresses the challenge of code breakage caused by frequent third-party API changes, a problem exacerbated by large language models’ (LLMs) limited structural understanding of API evolution. To overcome this, the authors propose a knowledge graph–enhanced framework that jointly constructs static and dynamic API knowledge graphs to model both intra-version structural relationships and inter-version evolutionary transitions. Code migration is formulated as two synergistic stages: evolutionary path retrieval and path-guided code generation. The framework leverages synthetically generated supervision signals derived from real-world API differences to enable end-to-end training. Experimental results demonstrate that the approach significantly outperforms standard LLMs on both single-package and multi-package benchmarks, achieving notable improvements in migration accuracy, controllability, and execution success rate.
📝 Abstract
Code evolution is inevitable in modern software development. Changes to third-party APIs frequently break existing code and complicate maintenance, posing practical challenges for developers. While large language models (LLMs) have shown promise in code generation, they struggle to reason without a structured representation of these evolving relationships, often leading them to produce outdated APIs or invalid outputs. In this work, we propose a knowledge graph-augmented framework that decomposes the migration task into two synergistic stages: evolution path retrieval and path-informed code generation. Our approach constructs static and dynamic API graphs to model intra-version structures and cross-version transitions, enabling structured reasoning over API evolution. Both modules are trained with synthetic supervision automatically derived from real-world API diffs, ensuring scalability and minimal human effort. Extensive experiments across single-package and multi-package benchmarks demonstrate that our framework significantly improves migration accuracy, controllability, and execution success over standard LLM baselines. The source code and datasets are available at: https://github.com/kangjz1203/KCoEvo.