🤖 AI Summary
This work investigates the potential relationship between self-admitted technical debt (SATD) and security vulnerability detection. We propose VulSATD, the first multi-task learning framework that jointly models SATD identification and vulnerability detection as shared representations. Built upon CodeBERT, VulSATD is trained end-to-end on the fused dataset MADE-WIC, incorporating weighted loss and function-level code representation learning. Experimental results show that multi-task learning does not significantly outperform single-task baselines, indicating no universal strong correlation between SATD and vulnerabilities; instead, only specific types of technical debt exhibit substantive coupling with security defects. This study provides the first systematic empirical validation of the task synergy boundary between SATD identification and vulnerability detection, offering both empirical evidence and methodological reflection for technical-debt-driven security analysis.
📝 Abstract
Multi-task learning is a paradigm that leverages information from related tasks to improve the performance of machine learning. Self-Admitted Technical Debt (SATD) are comments in the code that indicate not-quite-right code introduced for short-term needs, i.e., technical debt (TD). Previous research has provided evidence of a possible relationship between SATD and the existence of vulnerabilities in the code. In this work, we investigate if multi-task learning could leverage the information shared between SATD and vulnerabilities to improve the automatic detection of these issues. To this aim, we implemented VulSATD, a deep learner that detects vulnerable and SATD code based on CodeBERT, a pre-trained transformers model. We evaluated VulSATD on MADE-WIC, a fused dataset of functions annotated for TD (through SATD) and vulnerability. We compared the results using single and multi-task approaches, obtaining no significant differences even after employing a weighted loss. Our findings indicate the need for further investigation into the relationship between these two aspects of low-quality code. Specifically, it is possible that only a subset of technical debt is directly associated with security concerns. Therefore, the relationship between different types of technical debt and software vulnerabilities deserves future exploration and a deeper understanding.