Mixed precision accumulation for neural network inference guided by componentwise forward error analysis

📅 2025-03-19

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the accuracy-efficiency trade-off in accumulator computations during neural network inference. We propose a condition-number-driven mixed-precision accumulation method. Leveraging the first component-level forward error propagation model, we theoretically establish that accumulation error scales proportionally with the product of the weight-input inner product and the activation function’s condition number. Guided by this insight, we design a fine-grained precision scheduling mechanism: low-precision initial computation followed by high-precision selective recomputation for error-sensitive components. Evaluated across multiple models and datasets, our method preserves original model accuracy while reducing accumulator computational cost by over 30%, significantly outperforming uniform-precision baselines. Key contributions include: (1) a theoretical component-level forward error model; (2) a dynamic precision allocation paradigm coupling dual condition numbers (weight-input and activation); and (3) a lightweight, adaptive recomputation mechanism.

Technology Category

Application Category

📝 Abstract

This work proposes a mathematically founded mixed precision accumulation strategy for the inference of neural networks. Our strategy is based on a new componentwise forward error analysis that explains the propagation of errors in the forward pass of neural networks. Specifically, our analysis shows that the error in each component of the output of a layer is proportional to the condition number of the inner product between the weights and the input, multiplied by the condition number of the activation function. These condition numbers can vary widely from one component to the other, thus creating a significant opportunity to introduce mixed precision: each component should be accumulated in a precision inversely proportional to the product of these condition numbers. We propose a practical algorithm that exploits this observation: it first computes all components in low precision, uses this output to estimate the condition numbers, and recomputes in higher precision only the components associated with large condition numbers. We test our algorithm on various networks and datasets and confirm experimentally that it can significantly improve the cost--accuracy tradeoff compared with uniform precision accumulation baselines.

Problem

Research questions and friction points this paper is trying to address.

Develops mixed precision accumulation for neural network inference.

Uses componentwise forward error analysis to guide precision selection.

Improves cost-accuracy tradeoff by recomputing high-error components.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed precision accumulation guided by error analysis

Componentwise forward error analysis for neural networks

Condition number-based precision adjustment algorithm

🔎 Similar Papers

Gradient-based Automatic Mixed Precision Quantization for Neural Networks On-Chip