Towards Understanding the Use of MLLM-Enabled Applications for Visual Interpretation by Blind and Low Vision People

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work lacks empirical evaluation of multimodal large language models (MLLMs) for visual explanation in real-world, daily-life contexts of blind and low-vision (BLV) users—particularly regarding trustworthiness, usability, and high-stakes scenarios like medication dose identification. Method: We conducted a two-week user diary study with 20 BLV participants, collecting 553 authentic interaction logs; 60 logs underwent mixed-methods analysis to assess explanation reliability, trust formation mechanisms, and performance across structured and unstructured visual tasks. Contribution/Results: Participants reported high average trust (3.75/5) and satisfaction (4.15/5), even in critical medical tasks—demonstrating robust MLLM reliability under real-world constraints. This is the first empirical study to evaluate MLLMs for BLV users in authentic daily settings, bridging a key gap in trustworthy multimodal accessibility research. Findings provide empirically grounded design principles and methodological guidance for developing credible, inclusive multimodal human-AI interfaces.

Technology Category

Application Category

📝 Abstract
Blind and Low Vision (BLV) people have adopted AI-powered visual interpretation applications to address their daily needs. While these applications have been helpful, prior work has found that users remain unsatisfied by their frequent errors. Recently, multimodal large language models (MLLMs) have been integrated into visual interpretation applications, and they show promise for more descriptive visual interpretations. However, it is still unknown how this advancement has changed people's use of these applications. To address this gap, we conducted a two-week diary study in which 20 BLV people used an MLLM-enabled visual interpretation application we developed, and we collected 553 entries. In this paper, we report a preliminary analysis of 60 diary entries from 6 participants. We found that participants considered the application's visual interpretations trustworthy (mean 3.75 out of 5) and satisfying (mean 4.15 out of 5). Moreover, participants trusted our application in high-stakes scenarios, such as receiving medical dosage advice. We discuss our plan to complete our analysis to inform the design of future MLLM-enabled visual interpretation systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluate MLLM-enabled visual interpretation for BLV users
Assess trust and satisfaction in MLLM visual interpretation
Explore MLLM impact on high-stakes scenarios for BLV
Innovation

Methods, ideas, or system contributions that make the work stand out.

MLLM integration enhances visual interpretation accuracy
Two-week diary study with 20 BLV participants
High trust in MLLM for critical scenarios
🔎 Similar Papers
No similar papers found.
R
Ricardo E. Gonzalez Penuela
Cornell University, New York, USA
R
Ruiying Hu
Cornell Tech, New York, New York, USA
Sharon Lin
Sharon Lin
Google DeepMind
Computer SecurityMachine Learning
T
Tanisha Shende
Oberlin College, Oberlin, Ohio, USA
S
Shiri Azenkot
Cornell Tech, New York, New York, USA