🤖 AI Summary
Floating-point programs in safety-critical domains (e.g., military systems) may exhibit severe numerical errors triggered by rare inputs; existing detection techniques suffer from prohibitive computational overhead—due to reliance on expensive high-precision verification—and poor scalability in exploring distant regions of the input space. This paper proposes MGDE, the first approach to embed the Newton–Raphson method into a differential evolution framework, enabling mathematically guided, efficient, and long-range exploration of the input space. By leveraging analytical gradient information, MGDE accelerates convergence and substantially reduces the frequency of costly high-precision validation. Experimental results show that MGDE detects 89 vulnerabilities across 44 single-input programs—6.4× faster than state-of-the-art methods—and identifies an average of 9 vulnerabilities per multi-input program in just 0.644 seconds, demonstrating superior precision, robustness, and practical applicability.
📝 Abstract
Floating-point program errors can lead to severe consequences, particularly in critical domains such as military applications. Only a small subset of inputs may induce substantial floating-point errors, prompting researchers to develop methods for identifying these error-inducing inputs. Although existing approaches have achieved some success, they still suffer from two major limitations: (1) High computational cost: The evaluation of error magnitude for candidate inputs relies on high-precision programs, which are prohibitively time-consuming. (2) Limited long-range convergence capability: Current methods exhibit inefficiency in search, making the process akin to finding a needle in a haystack.
To address these two limitations, we propose a novel method, named MGDE, to detect error-inducing inputs based on mathematical guidance. By employing the Newton-Raphson method, which exhibits quadratic convergence properties, we achieve highly effective and efficient results. Since the goal of identifying error-inducing inputs is to uncover the underlying bugs, we use the number of bugs detected in floating-point programs as the primary evaluation metric in our experiments. As FPCC represents the most effective state-of-the-art approach to date, we use it as the baseline for comparison. The dataset of FPCC consists of 88 single-input floating-point programs. FPCC is able to detect 48 bugs across 29 programs, whereas our method successfully identifies 89 bugs across 44 programs. Moreover, FPCC takes 6.4096 times as long as our proposed method. We also deploy our method to multi-input programs, identifying a total of nine bugs with an average detection time of 0.6443 seconds per program. In contrast, FPCC fails to detect any bugs while requiring an average computation time of 100 seconds per program.