Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Widespread deployment of vision-language models (VLMs) faces critical challenges in user trust deficits and the absence of rigorous, interdisciplinary evaluation frameworks. Method: We conduct a multidisciplinary investigation—including systematic literature review, cognitive modeling, empirical user studies, participatory workshops, and meta-analysis—to develop the first taxonomy of human-VLM interaction trust, integrating insights from cognitive science, collaborative agent theory, and human factors engineering. Contribution/Results: The study identifies six fundamental trust-related challenges and four key research directions; proposes a practical, implementation-oriented framework for trust assessment and enhancement; and delivers a comprehensive, theoretically grounded, and empirically informed roadmap for designing, evaluating, and deploying trustworthy VLMs—bridging foundational theory with actionable design principles and evaluation methodologies.

Technology Category

Application Category

📝 Abstract
The rapid adoption of Vision Language Models (VLMs), pre-trained on large image-text and video-text datasets, calls for protecting and informing users about when to trust these systems. This survey reviews studies on trust dynamics in user-VLM interactions, through a multi-disciplinary taxonomy encompassing different cognitive science capabilities, collaboration modes, and agent behaviours. Literature insights and findings from a workshop with prospective VLM users inform preliminary requirements for future VLM trust studies.
Problem

Research questions and friction points this paper is trying to address.

Understanding user trust dynamics in Vision Language Models
Exploring multidisciplinary factors affecting VLM trustworthiness
Establishing requirements for future VLM trust research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reviewing trust dynamics in user-VLM interactions
Multi-disciplinary taxonomy for cognitive capabilities
Workshop insights for future trust studies