On Information Geometry and Iterative Optimization in Model Compression: Operator Factorization

📅 2025-07-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses deep learning model compression for resource-constrained devices by proposing a unified information-geometric framework: model compression is formulated as a projection onto an optimal low-dimensional computational submanifold within the parameter manifold. Methodologically, it integrates operator decomposition, iterative singular value thresholding (with proven convergence), soft-rank constraints, and smoothed rank reduction—optimizing information divergence in pretraining and iterative projection in fine-tuning scenarios. Key contributions include: (i) the first use of information divergence as an optimization objective for zero-shot compression; and (ii) the empirical and theoretical finding that bottleneck model trainability—not structural sparsity—is the dominant factor governing compressibility under fine-tuning. Experiments demonstrate that minimal modifications to existing pipelines yield substantially reduced accuracy degradation at fixed high compression ratios, achieving superior accuracy–efficiency trade-offs across diverse architectures and tasks.

Technology Category

Application Category

📝 Abstract
The ever-increasing parameter counts of deep learning models necessitate effective compression techniques for deployment on resource-constrained devices. This paper explores the application of information geometry, the study of density-induced metrics on parameter spaces, to analyze existing methods within the space of model compression, primarily focusing on operator factorization. Adopting this perspective highlights the core challenge: defining an optimal low-compute submanifold (or subset) and projecting onto it. We argue that many successful model compression approaches can be understood as implicitly approximating information divergences for this projection. We highlight that when compressing a pre-trained model, using information divergences is paramount for achieving improved zero-shot accuracy, yet this may no longer be the case when the model is fine-tuned. In such scenarios, trainability of bottlenecked models turns out to be far more important for achieving high compression ratios with minimal performance degradation, necessitating adoption of iterative methods. In this context, we prove convergence of iterative singular value thresholding for training neural networks subject to a soft rank constraint. To further illustrate the utility of this perspective, we showcase how simple modifications to existing methods through softer rank reduction result in improved performance under fixed compression rates.
Problem

Research questions and friction points this paper is trying to address.

Applying information geometry to analyze model compression techniques
Defining optimal low-compute submanifolds for efficient model projection
Improving iterative methods for high compression with minimal performance loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Applying information geometry to model compression
Using iterative singular value thresholding
Softer rank reduction improves compression performance