FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference

📅 2025-10-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the communication bottleneck in collaborative LLM inference on edge devices—caused by high-dimensional activation transmission and exacerbated by autoregressive decoding, where bandwidth scales linearly with output length—this paper proposes the first layer-aware activation compression framework leveraging frequency-domain sparsity. It innovatively exploits the energy concentration of first-layer Transformer activations in low-frequency components within the Fourier domain, enabling near-lossless reconstruction via conjugate symmetry. The framework integrates FFT-based coefficient truncation with hardware acceleration tailored for DSP/FPGA platforms. Evaluated on Llama-3 and Qwen2.5, it achieves an average 7.6× compression ratio with <0.3% accuracy degradation, while accelerating compression over Top-k by >32×. This work uniquely balances high compression efficiency, minimal reconstruction error, and real-time feasibility for edge-deployed LLM inference.

Technology Category

Application Category

📝 Abstract
Collaborative large language model (LLM) inference enables real-time, privacy-preserving AI services on resource-constrained edge devices by partitioning computational workloads between client devices and edge servers. However, this paradigm is severely hindered by communication bottlenecks caused by the transmission of high-dimensional intermediate activations, exacerbated by the autoregressive decoding structure of LLMs, where bandwidth consumption scales linearly with output length. Existing activation compression methods struggle to simultaneously achieve high compression ratios, low reconstruction error, and computational efficiency. This paper proposes FourierCompress, a novel, layer-aware activation compression framework that exploits the frequency-domain sparsity of LLM activations. We rigorously demonstrate that activations from the first Transformer layer exhibit strong smoothness and energy concentration in the low-frequency domain, making them highly amenable to near-lossless compression via the Fast Fourier Transform (FFT). FourierCompress transforms activations into the frequency domain, retains only a compact block of low-frequency coefficients, and reconstructs the signal at the server using conjugate symmetry, enabling seamless hardware acceleration on DSPs and FPGAs. Extensive experiments on Llama 3 and Qwen2.5 models across 10 commonsense reasoning datasets demonstrate that FourierCompress preserves performance remarkably close to the uncompressed baseline, outperforming Top-k, QR, and SVD. FourierCompress bridges the gap between communication efficiency (an average 7.6x reduction in activation size), near-lossless inference (less than 0.3% average accuracy loss), and significantly faster compression (achieving over 32x reduction in compression time compared to Top-k via hardware acceleration) for edge-device LLM inference.
Problem

Research questions and friction points this paper is trying to address.

Reducing communication bottlenecks in collaborative LLM inference
Compressing high-dimensional activations with minimal accuracy loss
Achieving efficient compression via frequency-domain sparsity exploitation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Fast Fourier Transform for activation compression
Retains only low-frequency coefficients for efficiency
Leverages conjugate symmetry for hardware acceleration
🔎 Similar Papers
No similar papers found.
J
Jian Ma
Beijing University of Posts and Telecommunications, China, and Pengcheng Laboratory, China
Xinchen Lyu
Xinchen Lyu
Beijing University of Posts and Telecommunications
Fog computingEdge cachingSDN
J
Jun Jiang
Pengcheng Laboratory, China
L
Longhao Zou
Pengcheng Laboratory, China
C
Chenshan Ren
Minzu University of China, China
Qimei Cui
Qimei Cui
Professor , School of Information and Communication Engineering ,Beijing University of Posts and
B5G/6G wireless communicationsmobile computing and IoT
Xiaofeng Tao
Xiaofeng Tao
Beijing University of Posts and Telecommunications
wireless communication