🤖 AI Summary
This paper systematically addresses core challenges in applying machine learning to binary vulnerability detection: insufficient training data scale, lack of standardized evaluation frameworks, ambiguous applicability of training data, and ill-defined modeling granularity. Drawing on empirical insights from Bin2vec and BinHunter, we propose a fine-grained (function-level and program-level) modeling framework tailored to real-world vulnerable programs, emphasizing transparency, reproducibility, and rigor in dataset construction and evaluation. Our contributions are threefold: (1) We identify structural deficiencies in prevailing methods concerning data representativeness and evaluation validity; (2) We establish a new research paradigm grounded in real vulnerabilities, integrating multi-granularity analysis and end-to-end traceability; and (3) We provide methodological guidance and practical pathways to enhance model utility and experimental reproducibility. The framework bridges critical gaps between academic research and industrial deployment, advancing both theoretical foundations and empirical practice in binary vulnerability detection.
📝 Abstract
In recent years, machine learning has demonstrated impressive results in various fields, including software vulnerability detection. Nonetheless, using machine learning to identify software vulnerabilities presents new challenges, especially regarding the scale of data involved, which was not a factor in traditional methods. Consequently, in spite of the rise of new machine-learning-based approaches in that space, important shortcomings persist regarding their evaluation. First, researchers often fail to provide concrete statistics about their training datasets, such as the number of samples for each type of vulnerability. Moreover, many methods rely on training with semantically similar functions rather than directly on vulnerable programs. This leads to uncertainty about the suitability of the datasets currently used for training. Secondly, the choice of a model and the level of granularity at which models are trained also affect the effectiveness of such vulnerability discovery approaches.
In this paper, we explore the challenges of applying machine learning to vulnerability discovery. We also share insights from our two previous research papers, Bin2vec and BinHunter, which could enhance future research in this field.