Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users

📅 2025-03-28

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Despite growing interest in multimodal large language models (MLLMs) for visual assistance, their effectiveness and accessibility bottlenecks for blind and low-vision (BLV) users remain poorly understood. Method: We conduct a user-informed study to identify key challenges and introduce the first BLV-centered multimodal benchmark—comprising five tasks: optical Braille recognition, assistive object localization, cultural context understanding, multilingual description generation, and video scene parsing. We further propose the first accessibility- and culture-aware MLLM evaluation framework and perform cross-model empirical analysis on 12 state-of-the-art models. Contribution/Results: Our evaluation reveals significant shortcomings across all models in Braille recognition, fine-grained object assistance, cultural adaptation, multilingual support, and hallucination mitigation. The study establishes an evidence base for accessible AI, introduces a novel human-centered evaluation paradigm, and delivers actionable, implementation-ready improvement pathways for inclusive multimodal systems.

Technology Category

Application Category

📝 Abstract

This paper explores the effectiveness of Multimodal Large Language models (MLLMs) as assistive technologies for visually impaired individuals. We conduct a user survey to identify adoption patterns and key challenges users face with such technologies. Despite a high adoption rate of these models, our findings highlight concerns related to contextual understanding, cultural sensitivity, and complex scene understanding, particularly for individuals who may rely solely on them for visual interpretation. Informed by these results, we collate five user-centred tasks with image and video inputs, including a novel task on Optical Braille Recognition. Our systematic evaluation of twelve MLLMs reveals that further advancements are necessary to overcome limitations related to cultural context, multilingual support, Braille reading comprehension, assistive object recognition, and hallucinations. This work provides critical insights into the future direction of multimodal AI for accessibility, underscoring the need for more inclusive, robust, and trustworthy visual assistance technologies.

Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs as assistive tools for visually impaired users

Identifying challenges in contextual and cultural understanding

Assessing MLLM performance in Braille recognition and object detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

User survey identifies MLLM adoption challenges

Five user-centered tasks with image inputs

Systematic evaluation of twelve MLLMs limitations

🔎 Similar Papers

No similar papers found.