🤖 AI Summary
Code-Comment Synchronization (CCS) addresses the prevalent issue of comment staleness during software evolution, yet existing approaches suffer from poor generalizability, heavy reliance on large-scale annotated data, and suboptimal LLM performance due to low-quality in-context examples and inaccurate candidate ranking. This paper proposes a retrieval-augmented in-context learning framework: first, a hybrid retrieval mechanism integrates code-comment semantics and change patterns to construct high-quality, few-shot exemplars; second, a multi-round, rule-driven re-ranking strategy elevates the ranking priority of correct candidates. The method supports multi-language settings (Java/Python) and consistently outperforms five state-of-the-art baselines across five mainstream LLMs and three benchmark datasets. Quantitative and qualitative analyses demonstrate significant improvements in synchronization accuracy, robustness, and cross-lingual generalization capability.
📝 Abstract
Code-Comment Synchronization (CCS) aims to synchronize the comments with code changes in an automated fashion, thereby significantly reducing the workload of developers during software maintenance and evolution. While previous studies have proposed various solutions that have shown success, they often exhibit limitations, such as a lack of generalization ability or the need for extensive task-specific learning resources. This motivates us to investigate the potential of Large Language Models (LLMs) in this area. However, a pilot analysis proves that LLMs fall short of State-Of-The-Art (SOTA) CCS approaches because (1) they lack instructive demonstrations for In-Context Learning (ICL) and (2) many correct-prone candidates are not prioritized.To tackle the above challenges, we propose R2ComSync, an ICL-based code-Comment Synchronization approach enhanced with Retrieval and Re-ranking. Specifically, R2ComSync carries corresponding two novelties: (1) Ensemble hybrid retrieval. It equally considers the similarity in both code-comment semantics and change patterns when retrieval, thereby creating ICL prompts with effective examples. (2) Multi-turn re-ranking strategy. We derived three significant rules through large-scale CCS sample analysis. Given the inference results of LLMs, it progressively exploits three re-ranking rules to prioritize relatively correct-prone candidates. We evaluate R2ComSync using five recent LLMs on three CCS datasets covering both Java and Python programming languages, and make comparisons with five SOTA approaches. Extensive experiments demonstrate the superior performance of R2ComSync against other approaches. Moreover, both quantitative and qualitative analyses provide compelling evidence that the comments synchronized by our proposal exhibit significantly higher quality.}