QS4D: Quantization-aware training for efficient hardware deployment of structured state-space sequential models

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Structured State Space Models (SSMs) face significant challenges in deployment on resource-constrained edge devices—particularly analog memristor-based in-memory computing chips—due to poor robustness against analog hardware noise and inadequate support for low-bit quantization. Method: This paper proposes the first Quantization-Aware Training (QAT) co-optimization framework tailored for SSMs. It jointly models the trade-off between model size and numerical precision, integrating structured pruning and noise-robustness enhancement mechanisms to enable algorithm-hardware co-design. Contribution/Results: The framework reduces both parameter count and computational complexity by two orders of magnitude. It enables successful deployment on analog memristor chips, achieving substantial improvements in inference energy efficiency. To our knowledge, this is the first work to empirically validate the feasibility and advantages of SSMs on analog in-memory computing platforms.

Technology Category

Application Category

📝 Abstract
Structured State Space models (SSM) have recently emerged as a new class of deep learning models, particularly well-suited for processing long sequences. Their constant memory footprint, in contrast to the linearly scaling memory demands of Transformers, makes them attractive candidates for deployment on resource-constrained edge-computing devices. While recent works have explored the effect of quantization-aware training (QAT) on SSMs, they typically do not address its implications for specialized edge hardware, for example, analog in-memory computing (AIMC) chips. In this work, we demonstrate that QAT can significantly reduce the complexity of SSMs by up to two orders of magnitude across various performance metrics. We analyze the relation between model size and numerical precision, and show that QAT enhances robustness to analog noise and enables structural pruning. Finally, we integrate these techniques to deploy SSMs on a memristive analog in-memory computing substrate and highlight the resulting benefits in terms of computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

Optimizing SSMs for efficient edge hardware deployment
Reducing model complexity via quantization-aware training
Enhancing robustness to analog noise in AIMC chips
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantization-aware training reduces SSM complexity
QAT enhances robustness to analog noise
Deploys SSMs on memristive AIMC hardware
🔎 Similar Papers
No similar papers found.