🤖 AI Summary
This work addresses EEG-driven auditory attention detection (AAD) in complex acoustic environments. We propose a brain-inspired spiking neural network (SNN) framework featuring a spike-driven symmetric dual-branch architecture that jointly models 1D token sequences and employs a token-channel mixer, augmented by a biologically inspired cross-modal feature fusion strategy to enable efficient collaborative learning of complementary EEG features. Compared with state-of-the-art methods, our model achieves new SOTA performance on three public AAD benchmarks while reducing parameter count by 14.7× and inference energy consumption by 5.8×. The framework thus strikes a significant balance between high decoding accuracy and ultra-low-power deployment—offering a practical, lightweight neural decoding paradigm for neuro-guided intelligent hearing aids.
📝 Abstract
Auditory attention detection (AAD) aims to decode listeners' focus in complex auditory environments from electroencephalography (EEG) recordings, which is crucial for developing neuro-steered hearing devices. Despite recent advancements, EEG-based AAD remains hindered by the absence of synergistic frameworks that can fully leverage complementary EEG features under energy-efficiency constraints. We propose S$^2$M-Former, a novel spiking symmetric mixing framework to address this limitation through two key innovations: i) Presenting a spike-driven symmetric architecture composed of parallel spatial and frequency branches with mirrored modular design, leveraging biologically plausible token-channel mixers to enhance complementary learning across branches; ii) Introducing lightweight 1D token sequences to replace conventional 3D operations, reducing parameters by 14.7$ imes$. The brain-inspired spiking architecture further reduces power consumption, achieving a 5.8$ imes$ energy reduction compared to recent ANN methods, while also surpassing existing SNN baselines in terms of parameter efficiency and performance. Comprehensive experiments on three AAD benchmarks (KUL, DTU and AV-GC-AAD) across three settings (within-trial, cross-trial and cross-subject) demonstrate that S$^2$M-Former achieves comparable state-of-the-art (SOTA) decoding accuracy, making it a promising low-power, high-performance solution for AAD tasks.