GHOST: Unmasking Phantom States in Mamba2 via Grouped Hidden-state Output-aware Selection&Truncation

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This work addresses the high inference overhead and bandwidth saturation in Mamba2 models caused by expanded state dimensions, which existing pruning methods struggle to mitigate effectively. The authors propose GHOST, a structured pruning framework that introduces balanced truncation—a concept from control theory—into Mamba2 pruning for the first time. By leveraging forward-pass statistics to jointly assess the controllability and observability of hidden states, GHOST enables efficient, gradient-free pruning. The method incorporates output-aware metrics, grouped hidden state selection, and structured sparsity strategies, achieving 50% compression of the state dimension across models ranging from 130M to 2.7B parameters. This results in only a ~1-point increase in WikiText-2 perplexity, nearly matching the accuracy of gradient-based approaches.

Technology Category

Application Category

📝 Abstract

While Mamba2's expanded state dimension enhances temporal modeling, it incurs substantial inference overhead that saturates bandwidth during autoregressive generation. Standard pruning methods fail to address this bottleneck: unstructured sparsity leaves activations dense, magnitude-based selection ignores runtime dynamics, and gradient-based methods impose prohibitive costs. We introduce GHOST (Grouped Hidden-state Output-aware Selection and Truncation), a structured pruning framework that approximates control-theoretic balanced truncation using only forward-pass statistics. By jointly measuring controllability and observability, GHOST rivals the fidelity of gradient-based methods without requiring backpropagation. As a highlight, on models ranging from 130M to 2.7B parameters, our approach achieves a 50\% state-dimension reduction with approximately 1 perplexity point increase on WikiText-2. Code is available at https://anonymous.4open.science/r/mamba2_ghost-7BCB/.

Problem

Research questions and friction points this paper is trying to address.

Mamba2

inference overhead

state dimension

structured pruning

autoregressive generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

structured pruning

Mamba2

balanced truncation