EffiVLM-BENCH: A Comprehensive Benchmark for Evaluating Training-Free Acceleration in Large Vision-Language Models

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large Vision-Language Models (LVLMs) face significant deployment challenges due to their substantial computational overhead. To address this, we introduce EffiVLM-Bench—the first comprehensive, training-free acceleration benchmark for LVLMs—designed to systematically evaluate token- and parameter-level compression methods across diverse backbones (e.g., CLIP, Qwen-VL, LLaVA), multimodal tasks (VQA, image captioning, visual reasoning), and multiple evaluation metrics (accuracy, generalization, faithfulness). We propose a novel three-dimensional evaluation framework and formally define and characterize the Pareto-optimal trade-off frontier in LVLM acceleration for the first time. Leveraging token pruning, training-free quantization, knowledge distillation, and attention sparsification, our approach achieves a 2.1× inference speedup on VQA with only a 0.8% accuracy drop. All code, configurations, and baseline results are publicly released.

Technology Category

Application Category

📝 Abstract
Large Vision-Language Models (LVLMs) have achieved remarkable success, yet their significant computational demands hinder practical deployment. While efforts to improve LVLM efficiency are growing, existing methods lack comprehensive evaluation across diverse backbones, benchmarks, and metrics. In this work, we systematically evaluate mainstream acceleration techniques for LVLMs, categorized into token and parameter compression. We introduce EffiVLM-Bench, a unified framework for assessing not only absolute performance but also generalization and loyalty, while exploring Pareto-optimal trade-offs. Our extensive experiments and in-depth analyses offer insights into optimal strategies for accelerating LVLMs. We open-source code and recipes for EffiVLM-Bench to foster future research.
Problem

Research questions and friction points this paper is trying to address.

Evaluating training-free acceleration in large vision-language models
Lack of comprehensive efficiency assessment across diverse LVLM components
Exploring optimal trade-offs for accelerating LVLMs effectively
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates token and parameter compression techniques
Introduces EffiVLM-Bench for unified performance assessment
Explores Pareto-optimal trade-offs in acceleration
🔎 Similar Papers
Z
Zekun Wang
Harbin Institute of Technology, Harbin, China
Minghua Ma
Minghua Ma
Microsoft
AIOpsCloud Intelligence
Z
Zexin Wang
Harbin Institute of Technology, Harbin, China
R
Rongchuan Mu
Harbin Institute of Technology, Harbin, China
L
Liping Shan
Pengcheng Laboratory, Shenzhen, China
M
Ming Liu
Harbin Institute of Technology, Harbin, China; Pengcheng Laboratory, Shenzhen, China
Bing Qin
Bing Qin
Professor in Harbin Institute of Technology
Natural Language ProcessingInformation ExtractionSentiment Analysis