DYNAMAX: Dynamic computing for Transformers and Mamba based architectures

📅 2025-04-29

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the high computational cost and latency inherent in large language model (LLM) inference. We propose DYNAMAX, the first early-exit (EE) framework tailored for the Mamba architecture. DYNAMAX innovatively repurposes Mamba blocks as lightweight, cross-architectural (Mamba/Transformer) exit classifiers within decoder-only LLMs and integrates a confidence-driven dynamic termination strategy to enable adaptive inference truncation. Experiments on Codestral-7B (Mamba) and Mistral-7B (Transformer) demonstrate substantial reductions in FLOPs and end-to-end latency, while maintaining competitive accuracy and consistency across multiple benchmarks—including TruthfulQA, CoQA, and TriviaQA. To our knowledge, this is the first study to explore early exit in Mamba-based models; it establishes Mamba’s efficacy as a compact, high-performance EE classifier and introduces a novel paradigm for efficient LLM deployment under resource constraints.

Technology Category

Application Category

📝 Abstract

Early exits (EEs) offer a promising approach to reducing computational costs and latency by dynamically terminating inference once a satisfactory prediction confidence on a data sample is achieved. Although many works integrate EEs into encoder-only Transformers, their application to decoder-only architectures and, more importantly, Mamba models, a novel family of state-space architectures in the LLM realm, remains insufficiently explored. This work introduces DYNAMAX, the first framework to exploit the unique properties of Mamba architectures for early exit mechanisms. We not only integrate EEs into Mamba but also repurpose Mamba as an efficient EE classifier for both Mamba-based and transformer-based LLMs, showcasing its versatility. Our experiments employ the Mistral 7B transformer compared to the Codestral 7B Mamba model, using data sets such as TruthfulQA, CoQA, and TriviaQA to evaluate computational savings, accuracy, and consistency. The results highlight the adaptability of Mamba as a powerful EE classifier and its efficiency in balancing computational cost and performance quality across NLP tasks. By leveraging Mamba's inherent design for dynamic processing, we open pathways for scalable and efficient inference in embedded applications and resource-constrained environments. This study underscores the transformative potential of Mamba in redefining dynamic computing paradigms for LLMs.

Problem

Research questions and friction points this paper is trying to address.

Exploring early exits in Mamba and decoder-only Transformers for efficiency

Introducing DYNAMAX for dynamic computing in Mamba-based architectures

Evaluating Mamba's adaptability as an early exit classifier for LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates early exits into Mamba architectures

Uses Mamba as efficient early exit classifier

Leverages Mamba's design for dynamic processing

🔎 Similar Papers

No similar papers found.