Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of deploying deep neural networks on 6G edge devices under extreme compression constraints while preserving model accuracy. Existing mixed-precision quantization methods suffer from coarse granularity, making them ill-suited to capture neuron-level variations in precision requirements. To overcome this limitation, we propose Neuron-level Mixed-Precision Quantization-Aware Training (NMP-QAT), which, for the first time, adaptively assigns discrete bit-widths to individual neurons during training—increasing precision only when necessary—and applies uniformly to both weights and activations. By integrating differentiable proxy functions, straight-through estimators, and a fully discrete inference graph, NMP-QAT significantly outperforms current mixed-precision QAT approaches on MLP and tabular models, achieving superior compression-accuracy trade-offs across both telecom and non-telecom datasets, thereby enabling greener edge AI deployment.
📝 Abstract
Deploying deep neural networks on resource-constrained 6G edge devices demands aggressive compression with minimal accuracy loss. Quantization-Aware Training (QAT) has emerged as a leading compression approach; however, existing mixed-precision methods typically operate at coarse layer- or channel-level granularity. These methods often rely on heuristic or search-based bit-allocation strategies, which may overlook fine-grained variability at the neuron level. We propose Neuron-Level Mixed-Precision QAT (NMP-QAT), where each neuron independently learns its own discrete precision during training. Starting from low-bit precision, NMP-QAT expands bit-width only when training signals demand it, via differentiable surrogates and straight-through estimators, while preserving a fully discrete inference graph. This adaptability extends to both weights and activations, reducing memory movement. Evaluated on telecom and non-telecom datasets across MLP and tabular foundation model architectures, NMP-QAT achieves superior compression-accuracy trade-offs over mixed-precision QAT baselines, making it well-suited for Green AI deployments at the network edge.
Problem

Research questions and friction points this paper is trying to address.

mixed-precision quantization
quantization-aware training
neuron-level granularity
edge AI
model compression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neuron-level quantization
Mixed-precision QAT
Differentiable surrogate
Straight-through estimator
Green AI