LaCoOT: Layer Collapse through Optimal Transport

📅 2024-06-13

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Deep neural networks suffer from excessive computational overhead and energy consumption due to over-parameterization, hindering deployment on resource-constrained devices. To address this, we propose an optimal transport-based, fine-tuning-free layer compression method: critical intermediate layers are directly removed by minimizing the Max-Sliced Wasserstein Distance (MSWD) between feature distributions of adjacent layers. This is the first work to incorporate MSWD as a regularization objective for layer compression—eliminating the need for retraining, pruning, or knowledge distillation. Evaluated on image classification tasks, our approach fully removes multiple intermediate layers with <0.5% accuracy degradation, significantly reduces FLOPs, and preserves end-to-end inference consistency. Our key contributions include: (i) a theoretically grounded, lossless deep collapse framework; (ii) zero-shot, fine-tuning-free compression; and (iii) efficient modeling of structural redundancy in deep networks.

Technology Category

Application Category

📝 Abstract

Although deep neural networks are well-known for their remarkable performance in tackling complex tasks, their hunger for computational resources remains a significant hurdle, posing energy-consumption issues and restricting their deployment on resource-constrained devices, which stalls their widespread adoption. In this paper, we present an optimal transport method to reduce the depth of over-parametrized deep neural networks, alleviating their computational burden. More specifically, we propose a new regularization strategy based on the Max-Sliced Wasserstein distance to minimize the distance between the intermediate feature distributions in the neural network. We show that minimizing this distance enables the complete removal of intermediate layers in the network, with almost no performance loss and without requiring any finetuning. We assess the effectiveness of our method on traditional image classification setups. We commit to releasing the source code upon acceptance of the article.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational burden in deep neural networks

Minimizing distance between intermediate feature distributions

Achieving better performance-depth trade-off in networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal transport-based layer collapse method

Max-Sliced Wasserstein distance regularization

Removes intermediate layers for efficiency

🔎 Similar Papers

No similar papers found.