R2ComSync: Improving Code-Comment Synchronization with In-Context Learning and Reranking

📅 2025-10-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Code-Comment Synchronization (CCS) addresses the prevalent issue of comment staleness during software evolution, yet existing approaches suffer from poor generalizability, heavy reliance on large-scale annotated data, and suboptimal LLM performance due to low-quality in-context examples and inaccurate candidate ranking. This paper proposes a retrieval-augmented in-context learning framework: first, a hybrid retrieval mechanism integrates code-comment semantics and change patterns to construct high-quality, few-shot exemplars; second, a multi-round, rule-driven re-ranking strategy elevates the ranking priority of correct candidates. The method supports multi-language settings (Java/Python) and consistently outperforms five state-of-the-art baselines across five mainstream LLMs and three benchmark datasets. Quantitative and qualitative analyses demonstrate significant improvements in synchronization accuracy, robustness, and cross-lingual generalization capability.

Technology Category

Application Category

📝 Abstract
Code-Comment Synchronization (CCS) aims to synchronize the comments with code changes in an automated fashion, thereby significantly reducing the workload of developers during software maintenance and evolution. While previous studies have proposed various solutions that have shown success, they often exhibit limitations, such as a lack of generalization ability or the need for extensive task-specific learning resources. This motivates us to investigate the potential of Large Language Models (LLMs) in this area. However, a pilot analysis proves that LLMs fall short of State-Of-The-Art (SOTA) CCS approaches because (1) they lack instructive demonstrations for In-Context Learning (ICL) and (2) many correct-prone candidates are not prioritized.To tackle the above challenges, we propose R2ComSync, an ICL-based code-Comment Synchronization approach enhanced with Retrieval and Re-ranking. Specifically, R2ComSync carries corresponding two novelties: (1) Ensemble hybrid retrieval. It equally considers the similarity in both code-comment semantics and change patterns when retrieval, thereby creating ICL prompts with effective examples. (2) Multi-turn re-ranking strategy. We derived three significant rules through large-scale CCS sample analysis. Given the inference results of LLMs, it progressively exploits three re-ranking rules to prioritize relatively correct-prone candidates. We evaluate R2ComSync using five recent LLMs on three CCS datasets covering both Java and Python programming languages, and make comparisons with five SOTA approaches. Extensive experiments demonstrate the superior performance of R2ComSync against other approaches. Moreover, both quantitative and qualitative analyses provide compelling evidence that the comments synchronized by our proposal exhibit significantly higher quality.}
Problem

Research questions and friction points this paper is trying to address.

Automating code-comment synchronization to reduce developer maintenance workload
Addressing limitations of existing approaches lacking generalization and requiring extensive resources
Improving LLM performance for code-comment synchronization through retrieval and reranking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses ensemble hybrid retrieval for code-comment similarity
Implements multi-turn re-ranking strategy for prioritization
Enhances in-context learning with effective demonstration examples
🔎 Similar Papers
No similar papers found.
Z
Zhen Yang
School of Computer Science and Technology, Shandong University, 72 Binhai Rd, Qingdao, 266237, Shandong, China.
H
Hongyi Lin
School of Computer Science and Technology, Shandong University, 72 Binhai Rd, Qingdao, 266237, Shandong, China.
X
Xiao Yu
The State Key Laboratory of Blockchain and Data Security, Zhejiang University, 38 Zheda Rd, Hangzhou, 310058, Zhejiang, China.
J
Jacky Wai Keung
Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Hong Kong, 999077, China.
S
Shuo Liu
Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Hong Kong, 999077, China.
P
Pak Yuen Patrick Chan
Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Hong Kong, 999077, China.
Y
Yicheng Sun
Department of Computer Science, City University of Hong Kong, Tat Chee Avenue, Hong Kong, 999077, China.
Fengji Zhang
Fengji Zhang
Department of Computer Science, City University of Hong Kong
Software EngineeringLarge Language Models