Domain-constrained Synthesis of Inconsistent Key Aspects in Textual Vulnerability Descriptions

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Inconsistent cross-source representation of Textual Vulnerability Descriptions (TVDs) hinders comprehensive vulnerability understanding and analytical efficiency. To address this, we propose a domain-constrained Large Language Model (LLM)-based synthesis framework comprising three stages: (1) rule-driven key information extraction; (2) semantic variability self-assessment anchored on domain-specific keywords; and (3) information-entropy-guided multi-source fusion. Our key innovations include a novel keyword-anchored self-assessment mechanism and an entropy-weighted fusion strategy, enabling unified representation of heterogeneous elements while preserving original information fidelity. We further design Digest Labels—a visual analytics tool—to enhance interpretability and usability. Experiments demonstrate that our approach achieves an F1-score of 0.87 (+5% absolute gain) for key element identification and improves vulnerability comprehension efficiency by over 30%. Human evaluation confirms its superior accuracy and practical utility compared to baseline methods.

Technology Category

Application Category

📝 Abstract

Textual Vulnerability Descriptions (TVDs) are crucial for security analysts to understand and address software vulnerabilities. However, the key aspect inconsistencies in TVDs from different repositories pose challenges for achieving a comprehensive understanding of vulnerabilities. Existing approaches aim to mitigate inconsistencies by aligning TVDs with external knowledge bases, but they often discard valuable information and fail to synthesize comprehensive representations. In this paper, we propose a domain-constrained LLM-based synthesis framework for unifying key aspects of TVDs. Our framework consists of three stages: 1) Extraction, guided by rule-based templates to ensure all critical details are captured; 2) Self-evaluation, using domain-specific anchor words to assess semantic variability across sources; and 3) Fusion, leveraging information entropy to reconcile inconsistencies and prioritize relevant details. This framework improves synthesis performance, increasing the F1 score for key aspect augmentation from 0.82 to 0.87, while enhancing comprehension and efficiency by over 30%. We further develop Digest Labels, a practical tool for visualizing TVDs, which human evaluations show significantly boosts usability.

Problem

Research questions and friction points this paper is trying to address.

Synthesizing inconsistent key aspects in vulnerability descriptions

Addressing information loss in existing inconsistency mitigation approaches

Unifying fragmented vulnerability information across different repositories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-constrained LLM framework synthesizes key vulnerability aspects

Rule-based extraction and self-evaluation using domain anchor words

Information entropy fusion reconciles inconsistencies in vulnerability descriptions

🔎 Similar Papers

Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG