MVTamperBench: Evaluating Robustness of Vision-Language Models

📅 2024-12-27

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work addresses the critical reliability gap of vision-language models (VLMs) under video tampering. To this end, we introduce VidTamperBench—the first benchmark dedicated to evaluating VLM robustness against video tampering. We propose the first video-level tampering robustness evaluation paradigm, systematically assessing five realistic tampering operations: rotation, frame dropping, masking, replacement, and duplication. Built upon VLMEvalKit, our modular evaluation framework ensures reproducible and standardized performance analysis. Extensive experiments reveal substantial robustness disparities across mainstream VLMs: InternVL2-8B demonstrates strong resilience, whereas models such as Llama-VILA1.5-8B suffer severe performance degradation. VidTamperBench is publicly released and has been adopted by multiple research groups, facilitating the development and rigorous evaluation of tamper-resilient VLMs.

Technology Category

Application Category

📝 Abstract

Recent advancements in Vision-Language Models (VLMs) have enabled significant progress in complex video understanding tasks. However, their robustness to real-world manipulations remains underexplored, limiting their reliability in critical applications. To address this gap, we introduce MVTamperBench, a comprehensive benchmark designed to evaluate VLM's resilience to video tampering effects, including rotation, dropping, masking, substitution, and repetition. By systematically assessing state-of-the-art models, MVTamperBench reveals substantial variability in robustness, with models like InternVL2-8B achieving high performance, while others, such as Llama-VILA1.5-8B, exhibit severe vulnerabilities. To foster broader adoption and reproducibility, MVTamperBench is integrated into VLMEvalKit, a modular evaluation toolkit, enabling streamlined testing and facilitating advancements in model robustness. Our benchmark represents a critical step towards developing tamper-resilient VLMs, ensuring their dependability in real-world scenarios. Project Page: https://amitbcp.github.io/MVTamperBench/

Problem

Research questions and friction points this paper is trying to address.

Visual Language Models

Video Tampering

Reliability Assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

MVTamperBench

Video Tampering Resistance

Visual Language Models

🔎 Similar Papers

B-AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Black-box Adversarial Visual-Instructions