🤖 AI Summary
This work addresses the critical reliability gap of vision-language models (VLMs) under video tampering. To this end, we introduce VidTamperBench—the first benchmark dedicated to evaluating VLM robustness against video tampering. We propose the first video-level tampering robustness evaluation paradigm, systematically assessing five realistic tampering operations: rotation, frame dropping, masking, replacement, and duplication. Built upon VLMEvalKit, our modular evaluation framework ensures reproducible and standardized performance analysis. Extensive experiments reveal substantial robustness disparities across mainstream VLMs: InternVL2-8B demonstrates strong resilience, whereas models such as Llama-VILA1.5-8B suffer severe performance degradation. VidTamperBench is publicly released and has been adopted by multiple research groups, facilitating the development and rigorous evaluation of tamper-resilient VLMs.
📝 Abstract
Recent advancements in Vision-Language Models (VLMs) have enabled significant progress in complex video understanding tasks. However, their robustness to real-world manipulations remains underexplored, limiting their reliability in critical applications. To address this gap, we introduce MVTamperBench, a comprehensive benchmark designed to evaluate VLM's resilience to video tampering effects, including rotation, dropping, masking, substitution, and repetition. By systematically assessing state-of-the-art models, MVTamperBench reveals substantial variability in robustness, with models like InternVL2-8B achieving high performance, while others, such as Llama-VILA1.5-8B, exhibit severe vulnerabilities. To foster broader adoption and reproducibility, MVTamperBench is integrated into VLMEvalKit, a modular evaluation toolkit, enabling streamlined testing and facilitating advancements in model robustness. Our benchmark represents a critical step towards developing tamper-resilient VLMs, ensuring their dependability in real-world scenarios. Project Page: https://amitbcp.github.io/MVTamperBench/