🤖 AI Summary
This work proposes a text-free neural information retrieval method that directly maps electroencephalography (EEG) signals to paragraph representations, addressing query formulation difficulties faced by visually impaired users and in voice-based interaction scenarios due to cognitive or physical impairments. It presents the first systematic validation of auditory EEG for neural retrieval and introduces an innovative cross-modal (auditory + visual) EEG joint training strategy to mitigate data scarcity. Evaluated on the Alice (auditory) and Nieuwland (visual) datasets using a dual-encoder architecture with multiple pooling strategies, the cross-modal approach substantially improves retrieval performance over the BM25 text baseline, achieving an MRR of 0.474 (+31%), Hit@1 of 0.314 (+43%), and Hit@10 of 0.858 (+28%).
📝 Abstract
Query formulation from internal information needs remains fundamentally challenging across all Information Retrieval paradigms due to cognitive complexity and physical impairments. Brain Passage Retrieval (BPR) addresses this by directly mapping EEG signals to passage representations without intermediate text translation. However, existing BPR research exclusively uses visual stimuli, leaving critical questions unanswered: Can auditory EEG enable effective retrieval for voice-based interfaces and visually impaired users? Can training on combined EEG datasets from different sensory modalities improve performance despite severe data scarcity? We present the first systematic investigation of auditory EEG for BPR and evaluate cross-sensory training benefits. Using dual encoder architectures with four pooling strategies (CLS, mean, max, multi-vector), we conduct controlled experiments comparing auditory-only, visual-only, and combined training on the Alice (auditory) and Nieuwland (visual) datasets. Results demonstrate that auditory EEG consistently outperforms visual EEG, and cross-sensory training with CLS pooling achieves substantial improvements over individual training: 31% in MRR (0.474), 43% in Hit@1 (0.314), and 28% in Hit@10 (0.858). Critically, combined auditory EEG models surpass BM25 text baselines (MRR: 0.474 vs 0.428), establishing neural queries as competitive with traditional retrieval whilst enabling accessible interfaces. These findings validate auditory neural interfaces for IR tasks and demonstrate that cross-sensory training addresses data scarcity whilst outperforming single-modality approaches Code: https://github.com/NiallMcguire/Audio_BPR