🤖 AI Summary
To address data scarcity, severe class imbalance—particularly for rare vulnerability types—and insufficient semantic sharing due to task isolation in Vulnerability Type Prediction (VTP) and Line-level Vulnerability Detection (LVD), this paper proposes a unified framework integrating Embedding-layer-driven Adversarial Training (EDAT) and Multi-Task Learning (MTL). EDAT perturbs the embedding layer guided by semantic importance to generate high-fidelity synthetic samples, effectively mitigating data bias. MTL jointly optimizes VTP and LVD—two strongly correlated tasks—by sharing low-level representations. Experimental results demonstrate that our approach significantly outperforms state-of-the-art methods on both tasks: it achieves substantial gains in VTP F1-score, especially for rare vulnerabilities, and improves LVD accuracy while markedly reducing false positives.
📝 Abstract
Context: Software vulnerabilities pose a significant threat to modern software systems, as evidenced by the growing number of reported vulnerabilities and cyberattacks. These escalating trends underscore the urgent need for effective approaches that can automatically detect and understand software vulnerabilities. Objective: However, the scarcity of labeled samples and the class imbalance issue in vulnerability datasets present significant challenges for both Vulnerability Type Prediction (VTP) and Line-level Vulnerability Detection (LVD), especially for rare yet critical vulnerability types. Moreover, most existing studies treat VTP and LVD as independent tasks, overlooking their inherent correlation, which limits the potential to leverage shared semantic patterns across tasks. Methods: To address these limitations, we propose a unified approach that integrates Embedding-Layer Driven Adversarial Training (EDAT) with Multi-task Learning (MTL). Specifically, EDAT enhances model robustness by introducing adversarial perturbations to identifier embeddings, guided by semantic importance. Meanwhile, MTL improves overall performance by leveraging shared representations and inter-task correlations between VTP and LVD. Results: Extensive experiments demonstrate that our proposed approach outperforms state-of-the-art baselines on both VTP and LVD tasks. For VTP, it yields notable improvements in accuracy, precision, recall, and F1-score, particularly in identifying rare vulnerability types. Similarly, for LVD, our approach enhances line-level detection accuracy while significantly reducing false positives. Conclusion: Our study demonstrates that combining EDAT with MTL provides a unified solution that improves performance on both tasks and warrants further investigation.