SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design

📅 2025-09-21

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

To address the insufficient co-optimization of compression formats and dataflows in sparse large language model (LLM) accelerator design, this paper proposes SnipSnap—a unified optimization framework. Its key contributions are: (1) a hierarchical compression format encoding scheme enabling fine-grained format modeling; (2) an adaptive compression engine that dynamically matches compression formats to input sparsity patterns; and (3) a progressive co-search methodology that jointly optimizes compression formats and dataflows within a unified design space. Experimental evaluation demonstrates that SnipSnap reduces memory energy consumption by 18.24% on average and achieves inference speedups of 2248.3× and 21.0× over Sparseloop and DiMO-Sparse, respectively. These improvements significantly enhance the energy efficiency and throughput of sparse LLM accelerators.

Technology Category

Application Category

📝 Abstract

The growing scale of large language models (LLMs) has intensified demands on computation and memory, making efficient inference a key challenge. While sparsity can reduce these costs, existing design space exploration (DSE) frameworks often overlook compression formats, a key factor for leveraging sparsity on accelerators. This paper proposes SnipSnap, a joint compression format and dataflow co-optimization framework for efficient sparse LLM accelerator design. SnipSnap introduces: (1) a hierarchical compression format encoding to expand the design space; (2) an adaptive compression engine for selecting formats under diverse sparsity; and (3) a progressive co-search workflow that jointly optimizes dataflow and compression formats. SnipSnap achieves 18.24% average memory energy savings via format optimization, along with 2248.3$ imes$ and 21.0$ imes$ speedups over Sparseloop and DiMO-Sparse frameworks, respectively.

Problem

Research questions and friction points this paper is trying to address.

Addressing computational and memory demands of large language model inference

Overlooking compression formats in sparse accelerator design exploration

Co-optimizing compression formats and dataflow for efficient sparse LLM acceleration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical compression format encoding expands design space

Adaptive compression engine selects formats under sparsity

Progressive co-search workflow optimizes dataflow and formats

🔎 Similar Papers

No similar papers found.