RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices

📅 2024-12-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high memory footprint and low computational efficiency of RNN-based large language models (e.g., RWKV) when deployed on resource-constrained edge devices such as mobile robots and smartphones, this paper proposes the first end-to-end model compression framework tailored for the RWKV architecture. The framework integrates lightweight architectural redesign, structured pruning, post-training quantization, and knowledge distillation. It achieves 3.4×–5× memory reduction with negligible accuracy degradation. Moreover, compared to Transformer-based models of equivalent accuracy, the compressed RWKV models require approximately 4× less memory. This work constitutes the first systematic solution to the lightweight deployment challenge for RWKV-style models, delivering a reproducible, highly compatible technical pathway for efficient large-model inference on edge devices.

Technology Category

Application Category

📝 Abstract
To deploy LLMs on resource-contained platforms such as mobile robots and smartphones, non-transformers LLMs have achieved major breakthroughs. Recently, a novel RNN-based LLM family, Repentance Weighted Key Value (RWKV) has shown strong computational efficiency; nevertheless, RWKV models still have high parameter counts which limited their deployment. In this paper, we propose a suite of compression techniques, ranging from model architecture optimizations to post-training compression, tailored to the RWKV architecture. Combined, our techniques reduce the memory footprint of RWKV models by 3.4x -- 5x with only negligible degradation in accuracy; compared to transformer LLMs with similar accuracy, our models require 4x less memory footprint.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Memory Efficiency
Computational Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

RWKV-Lite
Super Compression
Memory Efficiency
🔎 Similar Papers
No similar papers found.