Towards Understanding the Use of MLLM-Enabled Applications for Visual Interpretation by Blind and Low Vision People

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Prior work lacks empirical evaluation of multimodal large language models (MLLMs) for visual explanation in real-world, daily-life contexts of blind and low-vision (BLV) users—particularly regarding trustworthiness, usability, and high-stakes scenarios like medication dose identification. Method: We conducted a two-week user diary study with 20 BLV participants, collecting 553 authentic interaction logs; 60 logs underwent mixed-methods analysis to assess explanation reliability, trust formation mechanisms, and performance across structured and unstructured visual tasks. Contribution/Results: Participants reported high average trust (3.75/5) and satisfaction (4.15/5), even in critical medical tasks—demonstrating robust MLLM reliability under real-world constraints. This is the first empirical study to evaluate MLLMs for BLV users in authentic daily settings, bridging a key gap in trustworthy multimodal accessibility research. Findings provide empirically grounded design principles and methodological guidance for developing credible, inclusive multimodal human-AI interfaces.

Technology Category

Application Category

📝 Abstract

Blind and Low Vision (BLV) people have adopted AI-powered visual interpretation applications to address their daily needs. While these applications have been helpful, prior work has found that users remain unsatisfied by their frequent errors. Recently, multimodal large language models (MLLMs) have been integrated into visual interpretation applications, and they show promise for more descriptive visual interpretations. However, it is still unknown how this advancement has changed people's use of these applications. To address this gap, we conducted a two-week diary study in which 20 BLV people used an MLLM-enabled visual interpretation application we developed, and we collected 553 entries. In this paper, we report a preliminary analysis of 60 diary entries from 6 participants. We found that participants considered the application's visual interpretations trustworthy (mean 3.75 out of 5) and satisfying (mean 4.15 out of 5). Moreover, participants trusted our application in high-stakes scenarios, such as receiving medical dosage advice. We discuss our plan to complete our analysis to inform the design of future MLLM-enabled visual interpretation systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluate MLLM-enabled visual interpretation for BLV users

Assess trust and satisfaction in MLLM visual interpretation

Explore MLLM impact on high-stakes scenarios for BLV

Innovation

Methods, ideas, or system contributions that make the work stand out.

MLLM integration enhances visual interpretation accuracy

Two-week diary study with 20 BLV participants

High trust in MLLM for critical scenarios

🔎 Similar Papers

No similar papers found.