Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Widespread deployment of vision-language models (VLMs) faces critical challenges in user trust deficits and the absence of rigorous, interdisciplinary evaluation frameworks. Method: We conduct a multidisciplinary investigation—including systematic literature review, cognitive modeling, empirical user studies, participatory workshops, and meta-analysis—to develop the first taxonomy of human-VLM interaction trust, integrating insights from cognitive science, collaborative agent theory, and human factors engineering. Contribution/Results: The study identifies six fundamental trust-related challenges and four key research directions; proposes a practical, implementation-oriented framework for trust assessment and enhancement; and delivers a comprehensive, theoretically grounded, and empirically informed roadmap for designing, evaluating, and deploying trustworthy VLMs—bridging foundational theory with actionable design principles and evaluation methodologies.

Technology Category

Application Category

📝 Abstract

The rapid adoption of Vision Language Models (VLMs), pre-trained on large image-text and video-text datasets, calls for protecting and informing users about when to trust these systems. This survey reviews studies on trust dynamics in user-VLM interactions, through a multi-disciplinary taxonomy encompassing different cognitive science capabilities, collaboration modes, and agent behaviours. Literature insights and findings from a workshop with prospective VLM users inform preliminary requirements for future VLM trust studies.

Problem

Research questions and friction points this paper is trying to address.

Understanding user trust dynamics in Vision Language Models

Exploring multidisciplinary factors affecting VLM trustworthiness

Establishing requirements for future VLM trust research

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reviewing trust dynamics in user-VLM interactions

Multi-disciplinary taxonomy for cognitive capabilities

Workshop insights for future trust studies

🔎 Similar Papers

Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions