SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models

📅 2024-10-14

🏛️ arXiv.org

📈 Citations: 32

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing layer-pruning methods neglect intrinsic inter-layer dependencies and semantic correlations in deep neural networks, leading to severe knowledge loss. To address this, we propose a representation-driven hierarchical partitioning pruning framework for large model compression. First, we leverage Centered Kernel Alignment (CKA) to measure inter-layer representation similarity and integrate Fisher-optimal segmentation to achieve semantically coherent layer partitioning. Second, we design a fine-grained, structure-aware, zero-fine-tuning importance scoring mechanism—GradNorm—operating within each segment to enable efficient, post-training pruning. Our method significantly outperforms state-of-the-art approaches on both image classification and large language model tasks. Under substantial model compression—up to 50% parameter reduction and 40% FLOPs decrease—it incurs less than 0.5% accuracy degradation, demonstrating strong applicability to edge and resource-constrained deployment scenarios.

Technology Category

Application Category

📝 Abstract

Layer pruning has emerged as a potent approach to remove redundant layers in the pre-trained network on the purpose of reducing network size and improve computational efficiency. However, existing layer pruning methods mostly overlook the intrinsic connections and inter-dependencies between different layers within complicated deep neural networks. This oversight can result in pruned models that do not preserve the essential characteristics of the pre-trained network as effectively as desired. To address these limitations, we propose a Similarity-Guided Layer Partition (SGLP) Pruning, a novel pruning framework that exploits representation similarity to guide efficient and informed layer removal for compressing large deep models. Our method begins by employing Centered Kernel Alignment (CKA) to quantify representational similarity between layers, uncovering structural patterns within the network. We then apply Fisher Optimal Segmentation on the similarity matrix to partition the network into semantically coherent layer segments. This segmentation allows pruning decisions to respect layer interdependencies and preserve essential knowledge. Within each segment, we introduce a fine-tuning-free importance evaluation using GradNorm, identifying and removing redundant layers in a targeted, segment-wise manner. Experimental results on both image classification tasks and large language models (LLMs) demonstrate that our proposed SGLP outperforms the state-of-the-art methods in accuracy and efficiency. Our approach achieves significant model compression with minimal performance degradation, making it well-suited for deployment in resource-limited environments.

Problem

Research questions and friction points this paper is trying to address.

Compressing large deep models by removing redundant layers efficiently

Preserving essential network characteristics through similarity-guided pruning

Maintaining model accuracy while reducing computational requirements significantly

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses CKA to quantify layer representation similarity

Applies Fisher segmentation to partition layers semantically

Employs GradNorm for fine-tuning-free importance evaluation

🔎 Similar Papers

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers