🤖 AI Summary
In automated vulnerability repair (AVR), machine learning (ML) filters struggle to replace test-based oracle validation due to unreliable predictions. Method: This paper proposes a novel hierarchical filtering paradigm—“ML-assisted, not ML-replacing, test oracles”—and theoretically derives the necessary precision and recall bounds for an ML pre-filter to improve end-to-end efficiency. It integrates lightweight classification models with program analysis and empirically evaluates the approach on test-driven APR pipelines such as APR4Vuln. Contribution/Results: Experiments demonstrate that when the ML model satisfies the derived theoretical bounds and its inference latency falls below a critical threshold relative to the vulnerability detector, pre-filtering significantly boosts throughput and repair efficiency. This work establishes a verifiable theoretical framework and practical guidelines for trustworthy, human-in-the-loop ML integration in software vulnerability repair.
📝 Abstract
[Context:] The acceptance of candidate patches in automated program repair has been typically based on testing oracles. Testing requires typically a costly process of building the application while ML models can be used to quickly classify patches, thus allowing more candidate patches to be generated in a positive feedback loop. [Problem:] If the model predictions are unreliable (as in vulnerability detection) they can hardly replace the more reliable oracles based on testing. [New Idea:] We propose to use an ML model as a preliminary filter of candidate patches which is put in front of a traditional filter based on testing. [Preliminary Results:] We identify some theoretical bounds on the precision and recall of the ML algorithm that makes such operation meaningful in practice. With these bounds and the results published in the literature, we calculate how fast some of state-of-the art vulnerability detectors must be to be more effective over a traditional AVR pipeline such as APR4Vuln based just on testing.