Mamba base PKD for efficient knowledge compression

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the challenge of balancing accuracy and efficiency in deploying deep neural networks (DNNs) under resource constraints, this paper introduces, for the first time, the Mamba architecture—based on selective state space models (S-SSMs)—into a progressive knowledge distillation (PKD) framework, yielding a lightweight student model tailored for image classification. Departing from conventional CNN- or RNN-based distillation paradigms, our approach enables scalable design of weak student models via multi-stage knowledge transfer. Experiments demonstrate that, on MNIST, the smallest student model achieves 72% accuracy using only 1% of the teacher’s FLOPs; a seven-model ensemble attains 98% accuracy with just 63% of the teacher’s FLOPs. On CIFAR-10, a compact student model reaches 50% accuracy with merely 5% of the teacher’s FLOPs, incurring only ~1% accuracy degradation. These results significantly enhance computational efficiency and practical deployability in resource-constrained settings.

Technology Category

Application Category

📝 Abstract

Deep neural networks (DNNs) have remarkably succeeded in various image processing tasks. However, their large size and computational complexity present significant challenges for deploying them in resource-constrained environments. This paper presents an innovative approach for integrating Mamba Architecture within a Progressive Knowledge Distillation (PKD) process to address the challenge of reducing model complexity while maintaining accuracy in image classification tasks. The proposed framework distills a large teacher model into progressively smaller student models, designed using Mamba blocks. Each student model is trained using Selective-State-Space Models (S-SSM) within the Mamba blocks, focusing on important input aspects while reducing computational complexity. The work's preliminary experiments use MNIST and CIFAR-10 as datasets to demonstrate the effectiveness of this approach. For MNIST, the teacher model achieves 98% accuracy. A set of seven student models as a group retained 63% of the teacher's FLOPs, approximating the teacher's performance with 98% accuracy. The weak student used only 1% of the teacher's FLOPs and maintained 72% accuracy. Similarly, for CIFAR-10, the students achieved 1% less accuracy compared to the teacher, with the small student retaining 5% of the teacher's FLOPs to achieve 50% accuracy. These results confirm the flexibility and scalability of Mamba Architecture, which can be integrated into PKD, succeeding in the process of finding students as weak learners. The framework provides a solution for deploying complex neural networks in real-time applications with a reduction in computational cost.

Problem

Research questions and friction points this paper is trying to address.

Reduces model complexity in image classification tasks.

Maintains accuracy while decreasing computational requirements.

Enables deployment in resource-constrained environments.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mamba Architecture integrated with PKD

Selective-State-Space Models reduce complexity

Progressive distillation maintains high accuracy

🔎 Similar Papers

No similar papers found.