LaCoOT: Layer Collapse through Optimal Transport

📅 2024-06-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep neural networks suffer from excessive computational overhead and energy consumption due to over-parameterization, hindering deployment on resource-constrained devices. To address this, we propose an optimal transport-based, fine-tuning-free layer compression method: critical intermediate layers are directly removed by minimizing the Max-Sliced Wasserstein Distance (MSWD) between feature distributions of adjacent layers. This is the first work to incorporate MSWD as a regularization objective for layer compression—eliminating the need for retraining, pruning, or knowledge distillation. Evaluated on image classification tasks, our approach fully removes multiple intermediate layers with <0.5% accuracy degradation, significantly reduces FLOPs, and preserves end-to-end inference consistency. Our key contributions include: (i) a theoretically grounded, lossless deep collapse framework; (ii) zero-shot, fine-tuning-free compression; and (iii) efficient modeling of structural redundancy in deep networks.

Technology Category

Application Category

📝 Abstract
Although deep neural networks are well-known for their remarkable performance in tackling complex tasks, their hunger for computational resources remains a significant hurdle, posing energy-consumption issues and restricting their deployment on resource-constrained devices, which stalls their widespread adoption. In this paper, we present an optimal transport method to reduce the depth of over-parametrized deep neural networks, alleviating their computational burden. More specifically, we propose a new regularization strategy based on the Max-Sliced Wasserstein distance to minimize the distance between the intermediate feature distributions in the neural network. We show that minimizing this distance enables the complete removal of intermediate layers in the network, with almost no performance loss and without requiring any finetuning. We assess the effectiveness of our method on traditional image classification setups. We commit to releasing the source code upon acceptance of the article.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational burden in deep neural networks
Minimizing distance between intermediate feature distributions
Achieving better performance-depth trade-off in networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal transport-based layer collapse method
Max-Sliced Wasserstein distance regularization
Removes intermediate layers for efficiency
🔎 Similar Papers
No similar papers found.
V
Victor Qu'etu
LTCI, Télécom Paris, Institut Polytechnique de Paris, France
N
Nour Hezbri
LTCI, Télécom Paris, Institut Polytechnique de Paris, France
Enzo Tartaglione
Enzo Tartaglione
Associate Professor, Télécom Paris, Institut Polytechnique de Paris
deep learningcompressionpruningdebiasingfrugal AI