RMT-KD: Random Matrix Theoretic Causal Knowledge Distillation

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying large deep models (e.g., BERT, ResNet) on resource-constrained edge devices remains challenging. To address this, we propose a causally grounded knowledge distillation method based on Random Matrix Theory (RMT). Unlike conventional pruning or heuristic low-rank approximations, our approach mathematically identifies and preserves information-rich principal directions by analyzing the spectral distribution of hidden-layer representations—enabling layer-wise causal structural compression. Integrated with self-distillation, it jointly enforces inter-layer causal reduction and representation stability. On multiple benchmark tasks, the compressed models achieve an 80% parameter reduction, with only a 2% accuracy drop, a 2.8× inference speedup, and a 47% power consumption reduction. This work is the first to systematically incorporate RMT into knowledge distillation, establishing a theoretically rigorous, interpretable, and causally principled paradigm for model compression.

Technology Category

Application Category

📝 Abstract
Large deep learning models such as BERT and ResNet achieve state-of-the-art performance but are costly to deploy at the edge due to their size and compute demands. We present RMT-KD, a compression method that leverages Random Matrix Theory (RMT) for knowledge distillation to iteratively reduce network size. Instead of pruning or heuristic rank selection, RMT-KD preserves only informative directions identified via the spectral properties of hidden representations. RMT-based causal reduction is applied layer by layer with self-distillation to maintain stability and accuracy. On GLUE, AG News, and CIFAR-10, RMT-KD achieves up to 80% parameter reduction with only 2% accuracy loss, delivering 2.8x faster inference and nearly halved power consumption. These results establish RMT-KD as a mathematically grounded approach to network distillation.
Problem

Research questions and friction points this paper is trying to address.

Compresses large deep learning models for edge deployment
Uses Random Matrix Theory for knowledge distillation
Reduces network size while maintaining accuracy and efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

RMT-based causal knowledge distillation compression
Preserves informative directions via spectral properties
Layer-by-layer self-distillation maintains stability accuracy
🔎 Similar Papers
No similar papers found.