🤖 AI Summary
This work addresses the challenge of real-time cognitive load assessment in smart eyewear, where existing eye-tracking methods are either reliant on bulky sensors or suffer from poor interpretability, weak generalization, and the need for task-specific fine-tuning. To overcome these limitations, we propose GazeMind, a novel framework that structures eye-movement data and leverages large language models for task-guided reasoning, enabling zero-shot cross-scenario generalization without fine-tuning. GazeMind further integrates user-specific traits and historical context to deliver personalized and highly interpretable cognitive load predictions. Evaluated on our newly curated large-scale dataset, CogLoad-Bench—comprising 152 participants, over 40 hours of recordings, and more than 10,000 annotated samples—GazeMind outperforms all baselines by over 20% across key metrics, achieving state-of-the-art performance.
📝 Abstract
Smart glasses with AI assistants are increasingly used in daily life. However, current systems lack awareness of the user's internal cognitive state, leaving them unable to proactively anticipate users' needs without access to cognitive load. Existing methods for assessing cognitive load either rely on impractical sensors for lightweight eyewear or utilize eye gaze-based models that suffer from poor interpretability, and require task-specific fine-tuning, often failing to generalize across individuals. We propose GazeMind, a gaze-guided LLM agent framework for cognitive load assessment on smart glasses. It encodes eye-tracking data into structured representations for LLM-based reasoning and provides interpretable cognitive load predictions. Importantly, GazeMind generalizes across scenarios without LLM fine-tuning through a novel task-guidance reasoning approach and achieves personalized adaptation by incorporating user-specific characteristics and historical references. To support evaluation, we introduce CogLoad-Bench, the largest gaze-based cognitive load dataset with 152 participants, 40+ hours of multimodal data, and 10K+ real-time annotations across controlled and real-world tasks. Experiments show that GazeMind achieves state-of-the-art performance, outperforming baselines by over 20% across all metrics.