๐ค AI Summary
Inconsistent cross-source representation of Textual Vulnerability Descriptions (TVDs) hinders comprehensive vulnerability understanding and analytical efficiency. To address this, we propose a domain-constrained Large Language Model (LLM)-based synthesis framework comprising three stages: (1) rule-driven key information extraction; (2) semantic variability self-assessment anchored on domain-specific keywords; and (3) information-entropy-guided multi-source fusion. Our key innovations include a novel keyword-anchored self-assessment mechanism and an entropy-weighted fusion strategy, enabling unified representation of heterogeneous elements while preserving original information fidelity. We further design Digest Labelsโa visual analytics toolโto enhance interpretability and usability. Experiments demonstrate that our approach achieves an F1-score of 0.87 (+5% absolute gain) for key element identification and improves vulnerability comprehension efficiency by over 30%. Human evaluation confirms its superior accuracy and practical utility compared to baseline methods.
๐ Abstract
Textual Vulnerability Descriptions (TVDs) are crucial for security analysts to understand and address software vulnerabilities. However, the key aspect inconsistencies in TVDs from different repositories pose challenges for achieving a comprehensive understanding of vulnerabilities. Existing approaches aim to mitigate inconsistencies by aligning TVDs with external knowledge bases, but they often discard valuable information and fail to synthesize comprehensive representations. In this paper, we propose a domain-constrained LLM-based synthesis framework for unifying key aspects of TVDs. Our framework consists of three stages: 1) Extraction, guided by rule-based templates to ensure all critical details are captured; 2) Self-evaluation, using domain-specific anchor words to assess semantic variability across sources; and 3) Fusion, leveraging information entropy to reconcile inconsistencies and prioritize relevant details. This framework improves synthesis performance, increasing the F1 score for key aspect augmentation from 0.82 to 0.87, while enhancing comprehension and efficiency by over 30%. We further develop Digest Labels, a practical tool for visualizing TVDs, which human evaluations show significantly boosts usability.