Precision in Practice: Knowledge Guided Code Summarizing Grounded in Industrial Expectations

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

Existing code summarization approaches often fail to meet industrial developers’ expectations regarding terminology consistency, functional categorization, and conciseness, limiting their practical adoption. This work proposes ExpSum, a novel method that systematically models developers’ multidimensional expectations for code documentation and integrates them into the summarization pipeline. ExpSum leverages function metadata abstraction, context-aware domain knowledge retrieval, an information filtering mechanism, and constraint-driven prompt engineering to guide large language models in generating structured and standardized summaries. Evaluated on real-world projects including HarmonyOS, ExpSum achieves significant improvements in BLEU-4 (+26.71%) and ROUGE-L (+20.10%) scores, while producing summaries that better align with developers’ practical needs.

Technology Category

Application Category

📝 Abstract

Code summaries are essential for helping developers understand code functionality and reducing maintenance and collaboration costs. Although recent advances in large language models (LLMs) have significantly improved automatic code summarization, the practical usefulness of generated summaries in industrial settings remains insufficiently explored. In collaboration with documentation experts from the industrial HarmonyOS project, we conducted a questionnaire study showing that over 57.4% of code summaries produced by state-of-the-art approaches were rejected due to violations of developers'expectations for industrial documentation. Beyond semantic similarity to reference summaries, developers emphasize additional requirements, including the use of appropriate domain terminology, explicit function categorization, and the avoidance of redundant implementation details. To address these expectations, we propose ExpSum, an expectation-aware code summarization approach that integrates function metadata abstraction, informative metadata filtering, context-aware domain knowledge retrieval, and constraint-driven prompting to guide LLMs in generating structured, expectation-aligned summaries. We evaluate ExpSum on the HarmonyOS project and widely used code summarization benchmarks. Experimental results show that ExpSum consistently outperforms all baselines, achieving improvements of up to 26.71% in BLEU-4 and 20.10% in ROUGE-L on HarmonyOS. Furthermore, LLM-based evaluations indicate that ExpSum-generated summaries better align with developer expectations across other projects, demonstrating its effectiveness for industrial code documentation.

Problem

Research questions and friction points this paper is trying to address.

code summarization

industrial documentation

developer expectations

domain terminology

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

expectation-aware summarization

code documentation

domain knowledge retrieval