Fairy$pm i$: the First 2-bit Complex LLM with All Parameters in ${pm1, pm i}$

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing quantization-aware training (QAT) methods treat full-precision model accuracy as the de facto upper bound for 2-bit quantized models, with no prior approach surpassing this ceiling. Method: We propose Fairy±i—the first large language model (LLM) whose *all* parameters reside strictly in the complex fourth roots of unity {±1, ±i}. Leveraging the symmetric optimal 2-bit representation enabled by these complex units, Fairy±i achieves multiplication-free, high-accuracy inference. Its complex-valued QAT directly maps weights to {±1, ±i}, and inference employs only additions and element-wise permutations—eliminating multiplications entirely. Contribution/Results: Fairy±i maintains strict 2-bit storage and computational efficiency while outperforming all existing 2-bit methods in both perplexity (PPL) and downstream task accuracy. Crucially, it is the first to empirically demonstrate that ultra-low-bit *complex-valued* LLMs can exceed the full-precision accuracy ceiling—a fundamental breakthrough in quantized LLM design.

Technology Category

Application Category

📝 Abstract
Quantization-Aware Training (QAT) integrates quantization into the training loop, enabling LLMs to learn robust low-bit representations, and is widely recognized as one of the most promising research directions. All current QAT research focuses on minimizing quantization error on full-precision models, where the full-precision accuracy acts as an upper bound (accuracy ceiling). No existing method has even attempted to surpass this ceiling. To break this ceiling, we propose a new paradigm: raising the ceiling (full-precision model), and then still quantizing it efficiently into 2 bits. We propose Fairy$pm i$, the first 2-bit quantization framework for complex-valued LLMs. Specifically, our method leverages the representational advantages of the complex domain to boost full-precision accuracy. We map weights to the fourth roots of unity ${pm1, pm i}$, forming a perfectly symmetric and information-theoretically optimal 2-bit representation. Importantly, each quantized weight has either a zero real or imaginary part, enabling multiplication-free inference using only additions and element swaps. Experimental results show that Fairy$pm i$ outperforms the ceiling of existing 2-bit quantization approaches in terms of both PPL and downstream tasks, while maintaining strict storage and compute efficiency. This work opens a new direction for building highly accurate and practical LLMs under extremely low-bit constraints.
Problem

Research questions and friction points this paper is trying to address.

Breaks accuracy ceiling in 2-bit LLM quantization
Uses complex domain for optimal 2-bit representation
Enables multiplication-free inference with additions/swaps
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses complex domain for 2-bit quantization
Employs quantization-aware training (QAT) paradigm
Enables multiplication-free inference via additions
🔎 Similar Papers
No similar papers found.