RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices

📅 2024-12-14

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

To address the high memory footprint and low computational efficiency of RNN-based large language models (e.g., RWKV) when deployed on resource-constrained edge devices such as mobile robots and smartphones, this paper proposes the first end-to-end model compression framework tailored for the RWKV architecture. The framework integrates lightweight architectural redesign, structured pruning, post-training quantization, and knowledge distillation. It achieves 3.4×–5× memory reduction with negligible accuracy degradation. Moreover, compared to Transformer-based models of equivalent accuracy, the compressed RWKV models require approximately 4× less memory. This work constitutes the first systematic solution to the lightweight deployment challenge for RWKV-style models, delivering a reproducible, highly compatible technical pathway for efficient large-model inference on edge devices.

Technology Category

Application Category

📝 Abstract

To deploy LLMs on resource-contained platforms such as mobile robots and smartphones, non-transformers LLMs have achieved major breakthroughs. Recently, a novel RNN-based LLM family, Repentance Weighted Key Value (RWKV) has shown strong computational efficiency; nevertheless, RWKV models still have high parameter counts which limited their deployment. In this paper, we propose a suite of compression techniques, ranging from model architecture optimizations to post-training compression, tailored to the RWKV architecture. Combined, our techniques reduce the memory footprint of RWKV models by 3.4x -- 5x with only negligible degradation in accuracy; compared to transformer LLMs with similar accuracy, our models require 4x less memory footprint.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Memory Efficiency

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

RWKV-Lite

Super Compression

Memory Efficiency

🔎 Similar Papers

No similar papers found.