Beyond Visual Perception: Insights from Smartphone Interaction of Visually Impaired Users with Large Multimodal Models

📅 2025-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies critical limitations of large multimodal models (LMMs) in assisting visually impaired users: insufficient social context awareness and inability to infer user intent. Through in-depth interviews, analysis of real-world image descriptions, and triangulated data from user studies and social media platforms—combined with qualitative coding and contextual analysis—we systematically uncover two fundamental bottlenecks: “contextual hallucination” and “intent misalignment.” Based on these findings, we propose seven actionable design principles. These principles establish a novel interaction paradigm integrating human-AI and AI-AI collaboration, advancing LMMs toward embodied, intent-aware assistive technologies. The framework significantly enhances the effectiveness, interactivity, and personalization of accessibility systems. (128 words)

Technology Category

Application Category

📝 Abstract
Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users of Be My AI (an LMM-based application) and analysis of its image descriptions from both study participants and social media platforms, we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.
Problem

Research questions and friction points this paper is trying to address.

Improving context awareness in LMMs
Enhancing intent-oriented capabilities
Designing personalized assistive technologies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large multimodal models assist visually impaired
Analyzing LMM limitations in social contexts
Proposing design for enhanced AI interactions
🔎 Similar Papers
No similar papers found.