FraQAT: Quantization Aware Training with Fractional bits

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address memory and computational bottlenecks in deploying diffusion models on mobile devices, this paper proposes Fractional-bit Quantization-Aware Training (FB-QAT), mitigating severe generative quality degradation during progressive quantization from 32-bit to 4-bit. The method features: (1) a differentiable fractional-bit quantization scheme that enables fine-grained information preservation between integer bit-widths; and (2) a progressive precision decay strategy coupled with gradient compensation to stabilize low-bit training dynamics. Evaluated on SD3.5-Medium, Sana, PixArt, and FLUX.1-schnell, FB-QAT achieves a 4–7% improvement in Fréchet Inception Distance (FiD) over baseline quantization methods—marking the first successful high-fidelity generation at 4-bit precision. The approach has been deployed on the Samsung S25U smartphone (powered by Snapdragon 8 Elite HTP), enabling efficient on-device inference.

Technology Category

Application Category

📝 Abstract
State-of-the-art (SOTA) generative models have demonstrated impressive capabilities in image synthesis or text generation, often with a large capacity model. However, these large models cannot be deployed on smartphones due to the limited availability of on-board memory and computations. Quantization methods lower the precision of the model parameters, allowing for efficient computations, eg, in INT{8}. Although aggressive quantization addresses efficiency and memory constraints, preserving the quality of the model remains a challenge. To retain quality in previous aggressive quantization, we propose a new fractional bits quantization (short) approach. The novelty is a simple yet effective idea: we progressively reduce the model's precision from 32 to 4 bits per parameter, and exploit the fractional bits during optimization to maintain high generation quality. We show that the short{} yields improved quality on a variety of diffusion models, including SD3.5-Medium, Sana, pixart, and FLUX.1-schnell, while achieving $4-7%$ lower FiD than standard QAT. Finally, we deploy and run Sana on a Samsung S25U, which runs on the Qualcomm SM8750-AB Snapdragon 8 Elite Hexagon Tensor Processor (HTP).
Problem

Research questions and friction points this paper is trying to address.

Quantization reduces model precision for mobile deployment
Aggressive quantization degrades generative model quality
Fractional bit optimization maintains quality during precision reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive precision reduction from 32 to 4 bits
Fractional bits exploitation during optimization phase
Maintains generation quality in aggressive quantization
🔎 Similar Papers
No similar papers found.
Luca Morreale
Luca Morreale
University College London
Computer Vision3D ReconstructionGeometric Processing
Alberto Gil C. P. Ramos
Alberto Gil C. P. Ramos
Samsung Electronics (AI Center)
Applied Mathematics
M
Malcolm Chadwick
Samsung AI Center, Cambridge, UK
M
Mehid Noroozi
Samsung AI Center, Cambridge, UK
Ruchika Chavhan
Ruchika Chavhan
PhD student, University of Edinburgh
Generative ModelsDiffusion ModelsRepresentation Learning
A
A. Mehrotra
Samsung AI Center, Cambridge, UK
S
Sourav Bhattacharya
Samsung AI Center, Cambridge, UK