EgoBrain: Synergizing Minds and Eyes For Human Action Understanding

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

257K/year

🤖 AI Summary

Modeling the brain–eye coordination mechanism during daily activities remains challenging due to the lack of large-scale, temporally aligned multimodal neurobehavioral data. To address this, we introduce the first large-scale, temporally synchronized EEG–first-person video dataset, comprising 61 hours of 32-channel EEG and corresponding egocentric video recordings from 40 participants across 29 naturalistic activities. This represents the first successful long-duration, high-fidelity simultaneous acquisition of neural and oculomotor signals. We propose a Transformer-based temporal feature fusion framework for EEG–visual modalities, coupled with a cross-domain adaptive training strategy, establishing a robust, subject- and environment-invariant paradigm for action understanding. Our method achieves 66.70% top-1 action recognition accuracy—significantly outperforming unimodal baselines. The complete dataset, acquisition protocol, and open-source analysis toolkit are publicly released to advance the integration of brain–computer interfaces and multimodal AI, thereby enabling open science in cognitive computing.

Technology Category

Application Category

📝 Abstract

The integration of brain-computer interfaces (BCIs), in particular electroencephalography (EEG), with artificial intelligence (AI) has shown tremendous promise in decoding human cognition and behavior from neural signals. In particular, the rise of multimodal AI models have brought new possibilities that have never been imagined before. Here, we present EgoBrain --the world's first large-scale, temporally aligned multimodal dataset that synchronizes egocentric vision and EEG of human brain over extended periods of time, establishing a new paradigm for human-centered behavior analysis. This dataset comprises 61 hours of synchronized 32-channel EEG recordings and first-person video from 40 participants engaged in 29 categories of daily activities. We then developed a muiltimodal learning framework to fuse EEG and vision for action understanding, validated across both cross-subject and cross-environment challenges, achieving an action recognition accuracy of 66.70%. EgoBrain paves the way for a unified framework for brain-computer interface with multiple modalities. All data, tools, and acquisition protocols are openly shared to foster open science in cognitive computing.

Problem

Research questions and friction points this paper is trying to address.

Integrating EEG and AI to decode human cognition and behavior

Creating a synchronized multimodal dataset for human action analysis

Developing a framework for EEG-vision fusion in action understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale synchronized EEG-vision multimodal dataset

Multimodal learning framework for action understanding

Openly shared data and tools for cognitive computing

🔎 Similar Papers

Achieving more human brain-like vision via human EEG representational alignment

2024-01-30arXiv.orgCitations: 4