Optimization of the quantization of dense neural networks from an exact QUBO formulation

📅 2025-10-17

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Optimizing rounding decisions in post-training quantization (PTQ) for ultra-low-bitwidth representations (int1–int8) remains challenging due to the combinatorial nature of discrete rounding. Method: This paper introduces the first exact Quadratic Unconstrained Binary Optimization (QUBO) formulation built upon the ADAROUND framework. It jointly optimizes binary rounding of weights and biases to minimize the Frobenius norm error between dequantized outputs and full-precision activations. By exploiting structural properties of the coefficient matrix, the large-scale QUBO is decomposed into smaller, efficiently solvable subproblems. The resulting formulation is solved via heuristic algorithms—including simulated annealing. Contribution/Results: Evaluated on MNIST, Fashion-MNIST, EMNIST, and CIFAR-10, the method significantly outperforms conventional rounding, improving accuracy by 1.5–4.2 percentage points at int2–int4 bitwidths. It represents the first exact QUBO-driven rounding optimization applicable across the full integer quantization spectrum.

Technology Category

Application Category

📝 Abstract

This work introduces a post-training quantization (PTQ) method for dense neural networks via a novel ADAROUND-based QUBO formulation. Using the Frobenius distance between the theoretical output and the dequantized output (before the activation function) as the objective, an explicit QUBO whose binary variables represent the rounding choice for each weight and bias is obtained. Additionally, by exploiting the structure of the coefficient QUBO matrix, the global problem can be exactly decomposed into $n$ independent subproblems of size $f+1$, which can be efficiently solved using some heuristics such as simulated annealing. The approach is evaluated on MNIST, Fashion-MNIST, EMNIST, and CIFAR-10 across integer precisions from int8 to int1 and compared with a round-to-nearest traditional quantization methodology.

Problem

Research questions and friction points this paper is trying to address.

Optimizing neural network quantization via exact QUBO formulation

Decomposing global quantization problem into independent subproblems

Evaluating quantization precision from int8 to int1 across datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-training quantization via ADAROUND-based QUBO formulation

Decomposes global QUBO into independent subproblems using matrix structure

Solves subproblems efficiently with simulated annealing heuristics

🔎 Similar Papers

Constraint Guided Model Quantization of Neural Networks