The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Existing LLM compression methods overemphasize perplexity or performance on simple tasks, neglecting higher-order capabilities—such as retrieval-augmented generation (RAG), multi-step reasoning, external tool invocation, and computational expressiveness. To address this, we propose the “Lottery LLM Hypothesis”: a smaller submodel, when augmented with RAG, tool-assisted execution, and semantic-aware KV cache compression, can match the original model’s performance on complex tasks. We thus introduce a capability-oriented compression framework that systematically identifies, preserves, and optimizes for these four critical capabilities. Our approach incorporates multi-step reasoning modeling, tool interface adaptation, and semantic-aware KV cache compression. Experiments demonstrate that the compressed models significantly outperform baselines on RAG and tool-integrated tasks. This work establishes a new theoretical foundation, capability-aware evaluation criteria, and practical technical pathways for application-driven, efficient LLM compression.

Technology Category

Application Category

📝 Abstract

Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.

Problem

Research questions and friction points this paper is trying to address.

LLM compression computational costs

Essential capabilities for lottery LLM

KV cache compression overlooked methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

KV cache compression

Multi-step reasoning

External tools integration

🔎 Similar Papers

Position IDs Matter: An Enhanced Position Layout for Efficient Context Compression in Large Language Models