🤖 AI Summary
Optimizing rounding decisions in post-training quantization (PTQ) for ultra-low-bitwidth representations (int1–int8) remains challenging due to the combinatorial nature of discrete rounding.
Method: This paper introduces the first exact Quadratic Unconstrained Binary Optimization (QUBO) formulation built upon the ADAROUND framework. It jointly optimizes binary rounding of weights and biases to minimize the Frobenius norm error between dequantized outputs and full-precision activations. By exploiting structural properties of the coefficient matrix, the large-scale QUBO is decomposed into smaller, efficiently solvable subproblems. The resulting formulation is solved via heuristic algorithms—including simulated annealing.
Contribution/Results: Evaluated on MNIST, Fashion-MNIST, EMNIST, and CIFAR-10, the method significantly outperforms conventional rounding, improving accuracy by 1.5–4.2 percentage points at int2–int4 bitwidths. It represents the first exact QUBO-driven rounding optimization applicable across the full integer quantization spectrum.
📝 Abstract
This work introduces a post-training quantization (PTQ) method for dense neural networks via a novel ADAROUND-based QUBO formulation. Using the Frobenius distance between the theoretical output and the dequantized output (before the activation function) as the objective, an explicit QUBO whose binary variables represent the rounding choice for each weight and bias is obtained. Additionally, by exploiting the structure of the coefficient QUBO matrix, the global problem can be exactly decomposed into $n$ independent subproblems of size $f+1$, which can be efficiently solved using some heuristics such as simulated annealing. The approach is evaluated on MNIST, Fashion-MNIST, EMNIST, and CIFAR-10 across integer precisions from int8 to int1 and compared with a round-to-nearest traditional quantization methodology.