From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants

๐Ÿ“… 2026-04-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes the first multimodal large language model (LLM) assistant that integrates eye-tracking data with first-person video to perceive usersโ€™ cognitive difficulties during interaction. By analyzing real-time gaze behavior, the system models the userโ€™s cognitive state to accurately identify comprehension bottlenecks and provide retrospective, personalized assistance. This approach pioneers the incorporation of eye movement signals into LLM-based interactive frameworks, significantly enhancing both the precision of support and usersโ€™ information recall. Experimental results demonstrate that, compared to text-only LLM assistants, the proposed system achieves higher accuracy and personalization scores, while also reducing the amount of user input required and improving overall interaction efficiency.
๐Ÿ“ Abstract
Current LLM assistants are powerful at answering questions, but they have limited access to the behavioral context that reveals when and where a user is struggling. We present a gaze-grounded multimodal LLM assistant that uses egocentric video with gaze overlays to identify likely points of difficulty and target follow-up retrospective assistance. We instantiate this vision in a controlled study (n=36) comparing the gaze-aware AI assistant to a text-only LLM assistant. Compared to a conventional LLM assistant, the gaze-aware assistant was rated as significantly more accurate and personalized in its assessments of users' reading behavior and significantly improved people's ability to recall information. Users spoke significantly fewer words with the gaze-aware assistant, indicating more efficient interactions. Qualitative results underscored both perceived benefits in comprehension and challenges when interpretations of gaze behaviors were inaccurate. Our findings suggest that gaze-aware LLM assistants can reason about cognitive needs to improve cognitive outcomes of users.
Problem

Research questions and friction points this paper is trying to address.

gaze-aware AI
cognitive needs
multimodal LLM
user assistance
behavioral context
Innovation

Methods, ideas, or system contributions that make the work stand out.

gaze-aware AI
multimodal LLM
cognitive needs
egocentric vision
retrospective assistance
๐Ÿ”Ž Similar Papers
No similar papers found.