1.58-bit FLUX

📅 2024-12-24

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

To address the high memory and computational overhead of FLUX.1-dev—a text-to-image diffusion model—during 1024×1024 high-resolution generation, this work proposes the first data-free, self-supervised ternary quantization method, constraining weights to {−1, 0, +1} (1.58 bits). We design custom CUDA/MLIR inference kernels specifically optimized for ternary weights and achieve end-to-end integration with the FLUX.1-dev architecture. Crucially, our approach requires no image data or fine-tuning, circumventing the data dependency and optimization challenges typical of ultra-low-bit quantization. Experiments demonstrate a 7.7× reduction in model size, a 5.1× decrease in GPU memory consumption during inference, and substantial latency reduction—all while preserving generation quality comparable to the full-precision baseline on GenEval and T2I-CompBench. This work establishes a new paradigm for efficient deployment of high-fidelity text-to-image models without compromising visual fidelity.

Technology Category

Application Category

📝 Abstract

We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images. Notably, our quantization method operates without access to image data, relying solely on self-supervision from the FLUX.1-dev model. Additionally, we develop a custom kernel optimized for 1.58-bit operations, achieving a 7.7x reduction in model storage, a 5.1x reduction in inference memory, and improved inference latency. Extensive evaluations on the GenEval and T2I Compbench benchmarks demonstrate the effectiveness of 1.58-bit FLUX in maintaining generation quality while significantly enhancing computational efficiency.

Problem

Research questions and friction points this paper is trying to address.

Large Image Generation

Resource Efficiency

Model Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

1.58-bit FLUX

Ultra-Low Precision Conversion

Efficiency Enhancement

🔎 Similar Papers

No similar papers found.