DocPrism: Local Categorization and External Filtering to Identify Relevant Code-Documentation Inconsistencies

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Inconsistencies between source code and its documentation often lead to misinterpretations and software defects; however, existing large language model (LLM)-based detection methods suffer from high false-positive rates, frequently misidentifying legitimate semantic gaps—such as those between high-level abstractions and low-level implementations—as errors. This paper proposes a lightweight, multi-language (Python, TypeScript, C++, Java) inconsistency detection framework. Its core contributions are: (1) Local Categorization—a context-local prompting strategy that guides LLMs to produce fine-grained, semantically grounded classifications, thereby mitigating long-range reasoning biases; and (2) External Filtering—leveraging domain-informed, rule-based post-processing to eliminate naturally occurring, non-defective discrepancies. The approach requires no LLM fine-tuning and relies solely on standard off-the-shelf models and localized prompts. Experiments demonstrate a low annotation burden (15% labeling rate), precision of 0.62, and a substantial improvement in accuracy—from 14% to 94%—significantly outperforming baseline methods.

Technology Category

Application Category

📝 Abstract

Code-documentation inconsistencies are common and undesirable: they can lead to developer misunderstandings and software defects. This paper introduces DocPrism, a multi-language, code-documentation inconsistency detection tool. DocPrism uses a standard large language model (LLM) to analyze and explain inconsistencies. Plain use of LLMs for this task yield unacceptably high false positive rates: LLMs identify natural gaps between high-level documentation and detailed code implementations as inconsistencies. We introduce and apply the Local Categorization, External Filtering (LCEF) methodology to reduce false positives. LCEF relies on the LLM's local completion skills rather than its long-term reasoning skills. In our ablation study, LCEF reduces DocPrism's inconsistency flag rate from 98% to 14%, and increases accuracy from 14% to 94%. On a broad evaluation across Python, TypeScript, C++, and Java, DocPrism maintains a low flag rate of 15%, and achieves a precision of 0.62 without performing any fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Detecting code-documentation inconsistencies across multiple programming languages

Reducing false positives in inconsistency detection using LLMs

Improving accuracy by filtering natural gaps between documentation and code

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses standard LLM for code-documentation analysis

Applies LCEF methodology to reduce false positives

Relies on local completion over long-term reasoning

🔎 Similar Papers

No similar papers found.