Multi-class Decoding of Attended Speaker Direction Using Electroencephalogram and Audio Spatial Spectrum

📅 2024-11-11

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study addresses the challenge of fine-grained spatial direction discrimination (14 classes) in EEG signals for hearing-impaired individuals using brain–computer interfaces. Method: We propose the first multi-class (non-binary) spatial direction decoding framework, introducing a novel dual-modality co-modeling approach that jointly leverages EEG and audio spatial spectrograms—overcoming the generalization limitations of unimodal methods. Our architecture integrates time-frequency EEG feature extraction with spatial spectral representation learning, implementing CNN, LSM-CNN, and a newly designed Sp-EEG-Deformer model. Contribution/Results: Under a 1-second decision window, the framework achieves leave-one-subject-out and leave-one-trial-out accuracies of 55.35% and 57.19%, respectively—significantly surpassing unimodal baselines. Performance gains are most pronounced with fewer direction classes. This work establishes a new paradigm for high-precision spatial attention decoding in assistive neurotechnology.

Technology Category

Application Category

📝 Abstract

Decoding the directional focus of an attended speaker from listeners' electroencephalogram (EEG) signals is essential for developing brain-computer interfaces to improve the quality of life for individuals with hearing impairment. Previous works have concentrated on binary directional focus decoding, i.e., determining whether the attended speaker is on the left or right side of the listener. However, a more precise decoding of the exact direction of the attended speaker is necessary for effective speech processing. Additionally, audio spatial information has not been effectively leveraged, resulting in suboptimal decoding results. In this paper, it is found that on the recently presented dataset with 14-class directional focus, models relying exclusively on EEG inputs exhibit significantly lower accuracy when decoding the directional focus in both leave-one-subject-out and leave-one-trial-out scenarios. By integrating audio spatial spectra with EEG features, the decoding accuracy can be effectively improved. The CNN, LSM-CNN, and Deformer models are employed to decode the directional focus from listeners' EEG signals and audio spatial spectra. The proposed Sp-EEG-Deformer model achieves notable 14-class decoding accuracies of 55.35% and 57.19% in leave-one-subject-out and leave-one-trial-out scenarios with a decision window of 1 second, respectively. Experiment results indicate increased decoding accuracy as the number of alternative directions reduces. These findings suggest the efficacy of our proposed dual modal directional focus decoding strategy.

Problem

Research questions and friction points this paper is trying to address.

EEG-based speech direction recognition

improved accuracy

multiple specific directions

Innovation

Methods, ideas, or system contributions that make the work stand out.

EEG

Spatial Audio Information

Speaker Direction Recognition

🔎 Similar Papers

No similar papers found.