🤖 AI Summary
Floating-point accumulation order is commonly undocumented in hardware and software, causing cross-platform numerical inconsistencies. This paper introduces FPRev, the first non-intrusive diagnostic tool capable of automatically inferring accumulation order in black-box systems—including CPUs, GPUs, and tensor cores—with low overhead and no source-code access. Our approach combines sensitivity-driven input construction, floating-point error propagation modeling, and output pattern analysis to reconstruct implicit reduction sequences. FPRev supports cross-library (e.g., NumPy, PyTorch) and cross-device consistency verification. Experiments uncover previously unknown accumulation strategies employed by major computational libraries across platforms. Compared to brute-force enumeration, FPRev reduces time complexity significantly while maintaining high inference accuracy. The implementation will be open-sourced.
📝 Abstract
Accumulation-based operations, such as summation and matrix multiplication, are fundamental to numerous computational domains. However, their accumulation orders are often undocumented in existing software and hardware implementations, making it difficult for developers to ensure consistent results across systems. To address this issue, we introduce FPRev, a diagnostic tool designed to reveal the accumulation order in the software and hardware implementations through numerical testing. With FPRev, developers can identify and compare accumulation orders, enabling developers to create reproducible software and verify implementation equivalence. FPRev is a testing-based tool that non-intrusively reveals the accumulation order by analyzing the outputs of the tested implementation for distinct specially designed inputs. Employing FPRev, we showcase the accumulation orders of popular libraries (such as NumPy and PyTorch) on CPUs and GPUs (including GPUs with specialized matrix accelerators such as Tensor Cores). We also validate the efficiency of FPRev through extensive experiments. FPRev exhibits a lower time complexity compared to the basic solution. FPRev will be open-sourced on GitHub.