🤖 AI Summary
To address the high memory and computational overhead of deploying deep neural networks in resource-constrained settings, this paper proposes a data-driven, post-training low-rank compression framework. Unlike conventional data-agnostic approaches, our method explicitly models the low-rank structure and noise bias inherent in activation tensors. We establish, for the first time, a three-tier progressive recovery theorem that theoretically justifies the superiority of data-driven compression and provides provable accuracy-efficiency trade-off guarantees. Leveraging matrix perturbation analysis, random matrix theory, and empirical risk minimization—combined with truncated SVD and adaptive rank selection—we achieve 40–60% model compression on standard image classification benchmarks, with <0.5% top-1 accuracy degradation and 35% reduction in inference latency. Crucially, our theoretically derived error bounds align closely with empirical observations.
📝 Abstract
Deep neural networks have achieved state-of-the-art performance across numerous applications, but their high memory and computational demands present significant challenges, particularly in resource-constrained environments. Model compression techniques, such as low-rank approximation, offer a promising solution by reducing the size and complexity of these networks while only minimally sacrificing accuracy. In this paper, we develop an analytical framework for data-driven post-training low-rank compression. We prove three recovery theorems under progressively weaker assumptions about the approximate low-rank structure of activations, modeling deviations via noise. Our results represent a step toward explaining why data-driven low-rank compression methods outperform data-agnostic approaches and towards theoretically grounded compression algorithms that reduce inference costs while maintaining performance.