Topology-Aware Layer Pruning for Large Vision-Language Models

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

218K/year
🤖 AI Summary
This work addresses the challenge of efficiently pruning large vision-language models (LVLMs), which suffer from high computational and memory costs. Existing pruning methods often fail to identify critical transitional layers that govern the evolution of multimodal representations, leading to significant performance degradation. To overcome this limitation, the study introduces topological data analysis into LVLM pruning for the first time. By modeling hidden states of each layer as point clouds, it employs simplicial complexes and Zigzag persistent homology to characterize the evolution of their topological structures across layers. This enables a quantitative measure of inter-layer topological consistency, which is then used to adaptively preserve essential transitional layers. The proposed method consistently outperforms existing pruning strategies across multiple multimodal benchmarks and maintains superior performance under varying sparsity levels.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated strong capabilities in natural language understanding and reasoning, while recent extensions that incorporate visual inputs enable them to process multimodal information. Despite these advances, Large Vision-Language Models (LVLMs) incur substantial computational and memory costs, hindering deployment in resource-constrained scenarios. Existing layer pruning methods typically rely on local similarity metrics or static proxy signals, failing to capture the global and dynamic evolution of representations across model depth, which often leads to the removal of transition-critical layers. To address this limitation, we propose a topology-aware layer pruning framework for LVLMs. Specifically, we represent layer wise hidden states as point clouds and models their evolution using \textit{simplicial complexes}. By leveraging \textit{zigzag persistent homology}, we quantify inter-layer topological consistency and enable adaptive pruning that preserves critical representational transitions. Extensive experiments on diverse multimodal benchmarks demonstrate that the proposed framework consistently outperforms existing pruning methods across a wide range of sparsity ratios. Our code is available at https://github.com/zpc456/TopoVLM.
Problem

Research questions and friction points this paper is trying to address.

layer pruning
Large Vision-Language Models
topological consistency
representational transitions
multimodal models
Innovation

Methods, ideas, or system contributions that make the work stand out.

topology-aware pruning
simplicial complexes
zigzag persistent homology
layer pruning
vision-language models
P
Pengcheng Zheng
University of Electronic Science and Technology of China
Chaoning Zhang
Chaoning Zhang
Professor at UESTC (电子科技大学, China)
Computer VisionLLM and VLMGenAI and AIGC Detection
Y
Ya Wen
University of Electronic Science and Technology of China
W
Wang Liu
University of Electronic Science and Technology of China
Q
Qigan Sun
Kyung Hee University
J
Jiarong Mo
University of Electronic Science and Technology of China
J
Jiaquan Zhang
University of Electronic Science and Technology of China
Jewon Lee
Jewon Lee
Nota Inc.
AI
Tae-Ho Kim
Tae-Ho Kim
Nota Inc.
K
Kuien Liu
Institute of Software Chinese Academy of Sciences
T
Tianyu Li
University of Electronic Science and Technology of China
C
Caiyan Qin
Harbin Institute of Technology, Shenzhen
Y
Yang Yang
University of Electronic Science and Technology of China