1.58-bit FLUX

📅 2024-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high memory and computational overhead of FLUX.1-dev—a text-to-image diffusion model—during 1024×1024 high-resolution generation, this work proposes the first data-free, self-supervised ternary quantization method, constraining weights to {−1, 0, +1} (1.58 bits). We design custom CUDA/MLIR inference kernels specifically optimized for ternary weights and achieve end-to-end integration with the FLUX.1-dev architecture. Crucially, our approach requires no image data or fine-tuning, circumventing the data dependency and optimization challenges typical of ultra-low-bit quantization. Experiments demonstrate a 7.7× reduction in model size, a 5.1× decrease in GPU memory consumption during inference, and substantial latency reduction—all while preserving generation quality comparable to the full-precision baseline on GenEval and T2I-CompBench. This work establishes a new paradigm for efficient deployment of high-fidelity text-to-image models without compromising visual fidelity.

Technology Category

Application Category

📝 Abstract
We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images. Notably, our quantization method operates without access to image data, relying solely on self-supervision from the FLUX.1-dev model. Additionally, we develop a custom kernel optimized for 1.58-bit operations, achieving a 7.7x reduction in model storage, a 5.1x reduction in inference memory, and improved inference latency. Extensive evaluations on the GenEval and T2I Compbench benchmarks demonstrate the effectiveness of 1.58-bit FLUX in maintaining generation quality while significantly enhancing computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

Large Image Generation
Resource Efficiency
Model Optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

1.58-bit FLUX
Ultra-Low Precision Conversion
Efficiency Enhancement
🔎 Similar Papers
No similar papers found.