🤖 AI Summary
Existing two-stream networks for EEG decoding typically process spatial and temporal features independently, fusing them only at later stages, which limits their ability to capture deep couplings between these modalities. To address this, this work proposes an inter-layer interactive two-stream network that enables progressive, dynamic fusion of spatial and temporal features at every layer through a Temporal-Spatial Integrated Attention (TSIA) mechanism. The TSIA leverages a spatial affinity correlation matrix and a cosine-gated temporal channel aggregation matrix for guided interaction, complemented by an adaptive fusion strategy with learnable channel weights. Evaluated across eight EEG datasets, the proposed method significantly outperforms thirteen state-of-the-art models, demonstrating superior decoding accuracy and robustness in motor imagery, emotion recognition, and steady-state visual evoked potential tasks.
📝 Abstract
Electroencephalography (EEG) provides a non-invasive window into brain activity, offering high temporal resolution crucial for understanding and interacting with neural processes through brain-computer interfaces (BCIs). Current dual-stream neural networks for EEG often process temporal and spatial features independently through parallel branches, delaying their integration until a final, late-stage fusion. This design inherently leads to an "information silo" problem, precluding intermediate cross-stream refinement and hindering spatial-temporal decompositions essential for full feature utilization. We propose LI-DSN, a layer-wise interactive dual-stream network that facilitates progressive, cross-stream communication at each layer, thereby overcoming the limitations of late-fusion paradigms. LI-DSN introduces a novel Temporal-Spatial Integration Attention (TSIA) mechanism, which constructs a Spatial Affinity Correlation Matrix (SACM) to capture inter-electrode spatial structural relationships and a Temporal Channel Aggregation Matrix (TCAM) to integrate cosine-gated temporal dynamics under spatial guidance. Furthermore, we employ an adaptive fusion strategy with learnable channel weights to optimize the integration of dual-stream features. Extensive experiments across eight diverse EEG datasets, encompassing motor imagery (MI) classification, emotion recognition, and steady-state visual evoked potentials (SSVEP), consistently demonstrate that LI-DSN significantly outperforms 13 state-of-the-art (SOTA) baseline models, showcasing its superior robustness and decoding performance. The code will be publicized after acceptance.