INTERLACE: Interleaved Layer Pruning and Efficient Adaptation in Large Vision-Language Models

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the severe performance degradation of large vision-language models (VLMs) after layer pruning, this paper proposes an interleaved layer-pruning and efficient fine-tuning framework. Methodologically, it introduces a novel local redundancy analysis strategy based on ternary consecutive-layer structures, coupled with an alternating optimization mechanism of pruning–fine-tuning–freezing, and incorporates critical-layer anchoring and sample-efficient adaptation. Using only 1% of the FineVision dataset and a single fine-tuning round, the method retains 88.9% of average task performance while removing 25% of network layers—substantially outperforming existing approaches. This work is the first to jointly leverage structured redundancy modeling and low-data-dependency fine-tuning for VLM compression. It achieves state-of-the-art performance preservation while significantly improving inference efficiency and deployment feasibility.

Technology Category

Application Category

📝 Abstract

We introduce INTERLACE, a novel framework that prunes redundant layers in VLMs while maintaining performance through sample-efficient finetuning. Existing layer pruning methods lead to significant performance drop when applied to VLMs. Instead, we analyze triplets of consecutive layers to identify local redundancy, removing the most redundant of the first two layers, finetune the remaining layer to compensate for the lost capacity, and freeze the third layer to serve as a stable anchor during finetuning. We found that this interleaved finetune-freeze design enables rapid convergence with minimal data after pruning. By finetuning only a subset of layers on just 1% of the FineVision dataset for one epoch, Interlace achieves 88.9% average performance retention after dropping 25% of the network, achieving SOTA performance. Our code is available at: https://github.com/pmadinei/Interlace.git

Problem

Research questions and friction points this paper is trying to address.

Reduces redundant layers in vision-language models while maintaining performance

Addresses significant performance drops from existing layer pruning methods

Enables efficient adaptation with minimal data after pruning layers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prunes redundant layers in vision-language models

Uses interleaved finetune-freeze design for stability

Achieves high performance with minimal data finetuning

🔎 Similar Papers

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers