Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Post-training compression of pre-trained large language models (LLMs) on resource-constrained devices remains challenging—particularly when original training data is unavailable and high-rank weight matrices impede conventional low-rank tensor decomposition methods. Method: We propose Sparse-Augmented Tensor Network (Saten), the first end-to-end differentiable, post-training tensorization framework for full-model compression. Saten integrates tensor-train (TT) network parameterization, structured sparsity regularization, and fine-tuning-aware tensor decomposition. Contribution/Results: Saten operates without access to pre-training data and overcomes performance bottlenecks of traditional tensor networks under high-rank constraints. Experiments across multiple downstream tasks demonstrate state-of-the-art (SOTA) accuracy—achieving an average +2.3% accuracy gain at equivalent compression ratios, reducing parameter count by 15×, and decreasing inference GPU memory consumption by 82%.

Technology Category

Application Category

📝 Abstract

The efficient implementation of large language models (LLMs) is crucial for deployment on resource-constrained devices. Low-rank tensor compression techniques, such as tensor-train (TT) networks, have been widely studied for over-parameterized neural networks. However, their applications to compress pre-trained large language models (LLMs) for downstream tasks (post-training) remains challenging due to the high-rank nature of pre-trained LLMs and the lack of access to pretraining data. In this study, we investigate low-rank tensorized LLMs during fine-tuning and propose sparse augmented tensor networks (Saten) to enhance their performance. The proposed Saten framework enables full model compression. Experimental results demonstrate that Saten enhances both accuracy and compression efficiency in tensorized language models, achieving state-of-the-art performance.

Problem

Research questions and friction points this paper is trying to address.

Compress large language models for resource-constrained devices

Address high-rank challenges in pre-trained LLM compression

Enhance accuracy and efficiency in tensorized model compression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse augmented tensor networks for compression

Enhances accuracy and compression efficiency

Full model compression for large language models

🔎 Similar Papers

No similar papers found.

ByteDance

United States / China / Singapore

Authors to Follow