π€ AI Summary
This work addresses the longstanding trade-off in floating-point debugging between efficiency and accuracy: lightweight methods suffer from high false-positive rates, while high-precision approaches incur prohibitive computational overhead. To reconcile this tension, the authors propose a two-stage residual computation strategy that decomposes residual evaluation into rounding error estimation and residual function evaluation, leveraging error-free transformations and high-precision arithmetic emulation to enhance numerical fidelity. Additionally, they introduce a residual coverage mechanism that mitigates absorption errors by constructing βstitchedβ execution paths through multiple re-executions. Empirical evaluation demonstrates that the method completely eliminates false positives in 10 out of 44 scientific computing workloads and substantially alleviates them in 3 others, achieving an average of only 3.6 re-executions across 169 benchmarks while significantly reducing the false-positive rate in 25 problematic cases.
π Abstract
Floating-point arithmetic is error-prone and unintuitive. Floating-point debuggers instrument programs to monitor floating-point arithmetic at run time and flag numerical issues. They estimate residues, i.e., the difference between actual floating-point and ideal real values, for every floating-point value in the program. Prior work explores various approaches for computing these residues accurately and efficiently. Unfortunately, the most efficient methods, based on "error-free transformations", have a high rate of false reports, while the most accurate methods, based on high-precision arithmetic, are very slow. This paper builds on error-free-transformations-based approaches and aims to improve their accuracy while preserving efficiency. To more accurately compute residues, this paper divides residue computation into two steps (rounding error computation and residue function evaluation) and shows how to perform each step accurately via careful improvements to the current state of the art. We evaluate on 44 large scientific computing workloads, focusing on the 14 benchmarks where prior tools produce false reports: our approach eliminates false reports on 10 benchmarks and substantially reduces them on the remaining 3 benchmarks. Moreover, complex numerical issues require additional care due to absorption, where two machine-precision residues cannot both be computed accurately in a single execution. This paper introduces residue override, which re-executes the program multiple times, computing different residues in different executions and assembling a final "patchwork" execution. We evaluate on 169 standard benchmarks drawn from numerical analysis papers and textbooks, requiring only 3.6 re-executions on average. Among 34 benchmarks with false reports in the initial run, residue override is triggered on 29 of them and reduces false reports on 25 of them, averaging 7.1 re-executions.