Precision in Practice: Knowledge Guided Code Summarizing Grounded in Industrial Expectations

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing code summarization approaches often fail to meet industrial developers’ expectations regarding terminology consistency, functional categorization, and conciseness, limiting their practical adoption. This work proposes ExpSum, a novel method that systematically models developers’ multidimensional expectations for code documentation and integrates them into the summarization pipeline. ExpSum leverages function metadata abstraction, context-aware domain knowledge retrieval, an information filtering mechanism, and constraint-driven prompt engineering to guide large language models in generating structured and standardized summaries. Evaluated on real-world projects including HarmonyOS, ExpSum achieves significant improvements in BLEU-4 (+26.71%) and ROUGE-L (+20.10%) scores, while producing summaries that better align with developers’ practical needs.

Technology Category

Application Category

📝 Abstract
Code summaries are essential for helping developers understand code functionality and reducing maintenance and collaboration costs. Although recent advances in large language models (LLMs) have significantly improved automatic code summarization, the practical usefulness of generated summaries in industrial settings remains insufficiently explored. In collaboration with documentation experts from the industrial HarmonyOS project, we conducted a questionnaire study showing that over 57.4% of code summaries produced by state-of-the-art approaches were rejected due to violations of developers'expectations for industrial documentation. Beyond semantic similarity to reference summaries, developers emphasize additional requirements, including the use of appropriate domain terminology, explicit function categorization, and the avoidance of redundant implementation details. To address these expectations, we propose ExpSum, an expectation-aware code summarization approach that integrates function metadata abstraction, informative metadata filtering, context-aware domain knowledge retrieval, and constraint-driven prompting to guide LLMs in generating structured, expectation-aligned summaries. We evaluate ExpSum on the HarmonyOS project and widely used code summarization benchmarks. Experimental results show that ExpSum consistently outperforms all baselines, achieving improvements of up to 26.71% in BLEU-4 and 20.10% in ROUGE-L on HarmonyOS. Furthermore, LLM-based evaluations indicate that ExpSum-generated summaries better align with developer expectations across other projects, demonstrating its effectiveness for industrial code documentation.
Problem

Research questions and friction points this paper is trying to address.

code summarization
industrial documentation
developer expectations
domain terminology
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

expectation-aware summarization
code documentation
domain knowledge retrieval
constraint-driven prompting
industrial code summarization
🔎 Similar Papers
No similar papers found.
J
Jintai Li
School of Computer Science, Wuhan University, China
S
Songqiang Chen
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, China
S
Shuo Jin
School of Computer Science, Wuhan University, China
Xiaoyuan Xie
Xiaoyuan Xie
Wuhan University
software testingprogram slicing and analysisdebugging and fault-localizationsearch-based software engineeringevolutionar