Using ML filters to help automated vulnerability repairs: when it helps and when it doesn't

📅 2025-04-09

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

In automated vulnerability repair (AVR), machine learning (ML) filters struggle to replace test-based oracle validation due to unreliable predictions. Method: This paper proposes a novel hierarchical filtering paradigm—“ML-assisted, not ML-replacing, test oracles”—and theoretically derives the necessary precision and recall bounds for an ML pre-filter to improve end-to-end efficiency. It integrates lightweight classification models with program analysis and empirically evaluates the approach on test-driven APR pipelines such as APR4Vuln. Contribution/Results: Experiments demonstrate that when the ML model satisfies the derived theoretical bounds and its inference latency falls below a critical threshold relative to the vulnerability detector, pre-filtering significantly boosts throughput and repair efficiency. This work establishes a verifiable theoretical framework and practical guidelines for trustworthy, human-in-the-loop ML integration in software vulnerability repair.

Technology Category

Application Category

📝 Abstract

[Context:] The acceptance of candidate patches in automated program repair has been typically based on testing oracles. Testing requires typically a costly process of building the application while ML models can be used to quickly classify patches, thus allowing more candidate patches to be generated in a positive feedback loop. [Problem:] If the model predictions are unreliable (as in vulnerability detection) they can hardly replace the more reliable oracles based on testing. [New Idea:] We propose to use an ML model as a preliminary filter of candidate patches which is put in front of a traditional filter based on testing. [Preliminary Results:] We identify some theoretical bounds on the precision and recall of the ML algorithm that makes such operation meaningful in practice. With these bounds and the results published in the literature, we calculate how fast some of state-of-the art vulnerability detectors must be to be more effective over a traditional AVR pipeline such as APR4Vuln based just on testing.

Problem

Research questions and friction points this paper is trying to address.

Using ML filters to improve automated vulnerability repair efficiency

Balancing ML model reliability with traditional testing oracles

Determining theoretical bounds for ML precision and recall

Innovation

Methods, ideas, or system contributions that make the work stand out.

ML filters classify patches quickly

Combine ML with traditional testing

Define bounds for ML precision

🔎 Similar Papers

APPATCH: Automated Adaptive Prompting Large Language Models for Real-World Software Vulnerability Patching