SnipSnap: A Joint Compression Format and Dataflow Co-Optimization Framework for Efficient Sparse LLM Accelerator Design

πŸ“… 2025-09-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the insufficient co-optimization of compression formats and dataflows in sparse large language model (LLM) accelerator design, this paper proposes SnipSnapβ€”a unified optimization framework. Its key contributions are: (1) a hierarchical compression format encoding scheme enabling fine-grained format modeling; (2) an adaptive compression engine that dynamically matches compression formats to input sparsity patterns; and (3) a progressive co-search methodology that jointly optimizes compression formats and dataflows within a unified design space. Experimental evaluation demonstrates that SnipSnap reduces memory energy consumption by 18.24% on average and achieves inference speedups of 2248.3Γ— and 21.0Γ— over Sparseloop and DiMO-Sparse, respectively. These improvements significantly enhance the energy efficiency and throughput of sparse LLM accelerators.

Technology Category

Application Category

πŸ“ Abstract
The growing scale of large language models (LLMs) has intensified demands on computation and memory, making efficient inference a key challenge. While sparsity can reduce these costs, existing design space exploration (DSE) frameworks often overlook compression formats, a key factor for leveraging sparsity on accelerators. This paper proposes SnipSnap, a joint compression format and dataflow co-optimization framework for efficient sparse LLM accelerator design. SnipSnap introduces: (1) a hierarchical compression format encoding to expand the design space; (2) an adaptive compression engine for selecting formats under diverse sparsity; and (3) a progressive co-search workflow that jointly optimizes dataflow and compression formats. SnipSnap achieves 18.24% average memory energy savings via format optimization, along with 2248.3$ imes$ and 21.0$ imes$ speedups over Sparseloop and DiMO-Sparse frameworks, respectively.
Problem

Research questions and friction points this paper is trying to address.

Addressing computational and memory demands of large language model inference
Overlooking compression formats in sparse accelerator design exploration
Co-optimizing compression formats and dataflow for efficient sparse LLM acceleration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical compression format encoding expands design space
Adaptive compression engine selects formats under sparsity
Progressive co-search workflow optimizes dataflow and formats
πŸ”Ž Similar Papers
No similar papers found.