Optimization of the quantization of dense neural networks from an exact QUBO formulation

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Optimizing rounding decisions in post-training quantization (PTQ) for ultra-low-bitwidth representations (int1–int8) remains challenging due to the combinatorial nature of discrete rounding. Method: This paper introduces the first exact Quadratic Unconstrained Binary Optimization (QUBO) formulation built upon the ADAROUND framework. It jointly optimizes binary rounding of weights and biases to minimize the Frobenius norm error between dequantized outputs and full-precision activations. By exploiting structural properties of the coefficient matrix, the large-scale QUBO is decomposed into smaller, efficiently solvable subproblems. The resulting formulation is solved via heuristic algorithms—including simulated annealing. Contribution/Results: Evaluated on MNIST, Fashion-MNIST, EMNIST, and CIFAR-10, the method significantly outperforms conventional rounding, improving accuracy by 1.5–4.2 percentage points at int2–int4 bitwidths. It represents the first exact QUBO-driven rounding optimization applicable across the full integer quantization spectrum.

Technology Category

Application Category

📝 Abstract
This work introduces a post-training quantization (PTQ) method for dense neural networks via a novel ADAROUND-based QUBO formulation. Using the Frobenius distance between the theoretical output and the dequantized output (before the activation function) as the objective, an explicit QUBO whose binary variables represent the rounding choice for each weight and bias is obtained. Additionally, by exploiting the structure of the coefficient QUBO matrix, the global problem can be exactly decomposed into $n$ independent subproblems of size $f+1$, which can be efficiently solved using some heuristics such as simulated annealing. The approach is evaluated on MNIST, Fashion-MNIST, EMNIST, and CIFAR-10 across integer precisions from int8 to int1 and compared with a round-to-nearest traditional quantization methodology.
Problem

Research questions and friction points this paper is trying to address.

Optimizing neural network quantization via exact QUBO formulation
Decomposing global quantization problem into independent subproblems
Evaluating quantization precision from int8 to int1 across datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-training quantization via ADAROUND-based QUBO formulation
Decomposes global QUBO into independent subproblems using matrix structure
Solves subproblems efficiently with simulated annealing heuristics
🔎 Similar Papers
No similar papers found.
Sergio Muñiz Subiñas
Sergio Muñiz Subiñas
Quantum Computing and Tensor Network researcher
M
Manuel L. González
Instituto Tecnológico de Castilla y León, Burgos, Spain
J
Jorge Ruiz Gómez
Instituto Tecnológico de Castilla y León, Burgos, Spain
Alejandro Mata Ali
Alejandro Mata Ali
Quantum Team Coordinator, ITCL/Lecturer of MIAX, BME/Teacher
Quantum Computingtensor networksapplied mathematics
J
Jorge Martínez Martín
Instituto Tecnológico de Castilla y León, Burgos, Spain
M
Miguel Franco Hernando
Instituto Tecnológico de Castilla y León, Burgos, Spain
Á
Ángel Miguel García-Vico
Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Jaén, 23071 Jaén, Spain