LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Large vision-language models (VLMs) incur prohibitive computational and memory costs due to long visual sequences and massive parameter counts; existing training-free compression methods suffer from three key limitations: non-uniform module support, narrow evaluation scenarios, and lack of technique synergy. Method: We introduce the first plug-and-play, training-free VLM compression benchmark platform, featuring a modular evaluation framework that systematically identifies distinct spatial versus temporal redundancies and empirically validates synergistic gains from combining token-level (e.g., pruning, pooling) and model-level (e.g., quantization, distillation) compression techniques. Contribution/Results: The platform integrates 20+ algorithms across five major VLM architectures and enables fine-grained, multi-strategy evaluation on multi-turn dialogue and detail-sensitive tasks. Experiments demonstrate that joint compression consistently outperforms isolated methods, achieves robust performance, and the open-sourced implementation accelerates efficient VLM research.

Technology Category

Application Category

📝 Abstract

Large Vision-Language Models (VLMs) exhibit impressive multi-modal capabilities but suffer from prohibitive computational and memory demands, due to their long visual token sequences and massive parameter sizes. To address these issues, recent works have proposed training-free compression methods. However, existing efforts often suffer from three major limitations: (1) Current approaches do not decompose techniques into comparable modules, hindering fair evaluation across spatial and temporal redundancy. (2) Evaluation confined to simple single-turn tasks, failing to reflect performance in realistic scenarios. (3) Isolated use of individual compression techniques, without exploring their joint potential. To overcome these gaps, we introduce LLMC+, a comprehensive VLM compression benchmark with a versatile, plug-and-play toolkit. LLMC+ supports over 20 algorithms across five representative VLM families and enables systematic study of token-level and model-level compression. Our benchmark reveals that: (1) Spatial and temporal redundancies demand distinct technical strategies. (2) Token reduction methods degrade significantly in multi-turn dialogue and detail-sensitive tasks. (3) Combining token and model compression achieves extreme compression with minimal performance loss. We believe LLMC+ will facilitate fair evaluation and inspire future research in efficient VLM. Our code is available at https://github.com/ModelTC/LightCompress.

Problem

Research questions and friction points this paper is trying to address.

Evaluate VLM compression fairly across spatial and temporal redundancy

Assess compression performance in realistic multi-turn dialogue scenarios

Explore joint potential of token and model compression techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play toolkit for VLM compression

Systematic study of token and model compression

Combines token and model compression techniques

🔎 Similar Papers

VoCo-LLaMA: Towards Vision Compression with Large Language Models