MVTamperBench: Evaluating Robustness of Vision-Language Models

📅 2024-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical reliability gap of vision-language models (VLMs) under video tampering. To this end, we introduce VidTamperBench—the first benchmark dedicated to evaluating VLM robustness against video tampering. We propose the first video-level tampering robustness evaluation paradigm, systematically assessing five realistic tampering operations: rotation, frame dropping, masking, replacement, and duplication. Built upon VLMEvalKit, our modular evaluation framework ensures reproducible and standardized performance analysis. Extensive experiments reveal substantial robustness disparities across mainstream VLMs: InternVL2-8B demonstrates strong resilience, whereas models such as Llama-VILA1.5-8B suffer severe performance degradation. VidTamperBench is publicly released and has been adopted by multiple research groups, facilitating the development and rigorous evaluation of tamper-resilient VLMs.

Technology Category

Application Category

📝 Abstract
Recent advancements in Vision-Language Models (VLMs) have enabled significant progress in complex video understanding tasks. However, their robustness to real-world manipulations remains underexplored, limiting their reliability in critical applications. To address this gap, we introduce MVTamperBench, a comprehensive benchmark designed to evaluate VLM's resilience to video tampering effects, including rotation, dropping, masking, substitution, and repetition. By systematically assessing state-of-the-art models, MVTamperBench reveals substantial variability in robustness, with models like InternVL2-8B achieving high performance, while others, such as Llama-VILA1.5-8B, exhibit severe vulnerabilities. To foster broader adoption and reproducibility, MVTamperBench is integrated into VLMEvalKit, a modular evaluation toolkit, enabling streamlined testing and facilitating advancements in model robustness. Our benchmark represents a critical step towards developing tamper-resilient VLMs, ensuring their dependability in real-world scenarios. Project Page: https://amitbcp.github.io/MVTamperBench/
Problem

Research questions and friction points this paper is trying to address.

Visual Language Models
Video Tampering
Reliability Assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

MVTamperBench
Video Tampering Resistance
Visual Language Models
🔎 Similar Papers
No similar papers found.