🤖 AI Summary
This study identifies critical limitations of large multimodal models (LMMs) in assisting visually impaired users: insufficient social context awareness and inability to infer user intent. Through in-depth interviews, analysis of real-world image descriptions, and triangulated data from user studies and social media platforms—combined with qualitative coding and contextual analysis—we systematically uncover two fundamental bottlenecks: “contextual hallucination” and “intent misalignment.” Based on these findings, we propose seven actionable design principles. These principles establish a novel interaction paradigm integrating human-AI and AI-AI collaboration, advancing LMMs toward embodied, intent-aware assistive technologies. The framework significantly enhances the effectiveness, interactivity, and personalization of accessibility systems. (128 words)
📝 Abstract
Large multimodal models (LMMs) have enabled new AI-powered applications that help people with visual impairments (PVI) receive natural language descriptions of their surroundings through audible text. We investigated how this emerging paradigm of visual assistance transforms how PVI perform and manage their daily tasks. Moving beyond usability assessments, we examined both the capabilities and limitations of LMM-based tools in personal and social contexts, while exploring design implications for their future development. Through interviews with 14 visually impaired users of Be My AI (an LMM-based application) and analysis of its image descriptions from both study participants and social media platforms, we identified two key limitations. First, these systems' context awareness suffers from hallucinations and misinterpretations of social contexts, styles, and human identities. Second, their intent-oriented capabilities often fail to grasp and act on users' intentions. Based on these findings, we propose design strategies for improving both human-AI and AI-AI interactions, contributing to the development of more effective, interactive, and personalized assistive technologies.