VMDT: Decoding the Trustworthiness of Video Foundation Models

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current video foundation models (T2V/V2T) lack a unified, comprehensive benchmark for evaluating trustworthiness—encompassing safety, hallucination, fairness, privacy, and adversarial robustness. To address this gap, we propose VMDT, the first multidimensional trustworthiness evaluation platform specifically designed for video foundation models. VMDT introduces standardized test suites and quantitative metrics, integrating both human and automated assessment protocols. We systematically evaluate seven T2V and nineteen V2T models. Key findings include: (i) model scale exhibits no significant correlation with safety performance; (ii) open-source T2V models frequently generate harmful content; (iii) unfairness in video models is markedly higher than in image-based counterparts; (iv) privacy and fairness risks increase with parameter count, whereas hallucination and adversarial robustness improve only marginally. VMDT provides a reproducible, extensible benchmark framework and diagnostic toolkit to advance trustworthy video foundation model research.

Technology Category

Application Category

📝 Abstract
As foundation models become more sophisticated, ensuring their trustworthiness becomes increasingly critical; yet, unlike text and image, the video modality still lacks comprehensive trustworthiness benchmarks. We introduce VMDT (Video-Modal DecodingTrust), the first unified platform for evaluating text-to-video (T2V) and video-to-text (V2T) models across five key trustworthiness dimensions: safety, hallucination, fairness, privacy, and adversarial robustness. Through our extensive evaluation of 7 T2V models and 19 V2T models using VMDT, we uncover several significant insights. For instance, all open-source T2V models evaluated fail to recognize harmful queries and often generate harmful videos, while exhibiting higher levels of unfairness compared to image modality models. In V2T models, unfairness and privacy risks rise with scale, whereas hallucination and adversarial robustness improve -- though overall performance remains low. Uniquely, safety shows no correlation with model size, implying that factors other than scale govern current safety levels. Our findings highlight the urgent need for developing more robust and trustworthy video foundation models, and VMDT provides a systematic framework for measuring and tracking progress toward this goal. The code is available at https://sunblaze-ucb.github.io/VMDT-page/.
Problem

Research questions and friction points this paper is trying to address.

Evaluating trustworthiness of video foundation models across multiple dimensions
Assessing safety, hallucination, fairness, privacy, and adversarial robustness
Benchmarking text-to-video and video-to-text model reliability systematically
Innovation

Methods, ideas, or system contributions that make the work stand out.

VMDT platform evaluates video foundation models trustworthiness
Assesses five dimensions: safety, hallucination, fairness, privacy, robustness
Systematic framework measures progress toward robust video models
🔎 Similar Papers
No similar papers found.
Yujin Potter
Yujin Potter
UC Berkeley
AI AlignmentAI Safety
Zhun Wang
Zhun Wang
Graduate Student, UC Berkeley
Nicholas Crispino
Nicholas Crispino
PhD Student, University of California, Santa Cruz
Natural Language Processing
Kyle Montgomery
Kyle Montgomery
UC Santa Cruz
Deep LearningNatural Language Processing
A
Alexander Xiong
University of California, Berkeley
E
Ethan Y. Chang
University of Illinois at Urbana-Champaign
Francesco Pinto
Francesco Pinto
Research Scientist, Google Deepmind
Agentic AI Safety and Security
Y
Yuqi Chen
University of California, Santa Cruz
R
Rahul Gupta
Amazon
M
Morteza Ziyadi
Amazon
Christos Christodoulopoulos
Christos Christodoulopoulos
Principal Technology Adviser, Information Commissioner's Office
Computational LinguisticsFact VerificationResponsible AI
B
Bo Li
University of Chicago
C
Chenguang Wang
University of California, Santa Cruz
D
D. Song
University of California, Berkeley