Scattering Transform for Auditory Attention Decoding

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of electroencephalography (EEG)-based auditory attention decoding in hearing aids under the "cocktail party" scenario by introducing, for the first time, a two-layer scattering transform into the preprocessing pipeline. This approach effectively extracts task-relevant time–frequency invariant features, substantially enhancing model generalization. The proposed method integrates filter banks with synchrosqueezed short-time Fourier transforms and combines convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and Transformer or graph neural networks for classification. Evaluated on the KUL dataset, the framework significantly outperforms existing methods, and demonstrates superior performance on the DTU dataset under large-data regimes or specific model configurations, confirming the informational gain and cross-speaker robustness conferred by the scattering transform.

Technology Category

Application Category

📝 Abstract
The use of hearing aids will increase in the coming years due to demographic change. One open problem that remains to be solved by a new generation of hearing aids is the cocktail party problem. A possible solution is electroencephalography-based auditory attention decoding. This has been the subject of several studies in recent years, which have in common that they use the same preprocessing methods in most cases. In this work, in order to achieve an advantage, the use of a scattering transform is proposed as an alternative to these preprocessing methods. The two-layer scattering transform is compared with a regular filterbank, the synchrosqueezing short-time Fourier transform and the common preprocessing. To demonstrate the performance, the known and the proposed preprocessing methods are compared for different classification tasks on two widely used datasets, provided by the KU Leuven (KUL) and the Technical University of Denmark (DTU). Both established and new neural-network-based models, CNNs, LSTMs, and recent Transformer/graph-based models are used for classification. Various evaluation strategies were compared, with a focus on the task of classifying speakers who are unknown from the training. We show that the two-layer scattering transform can significantly improve the performance for subject-related conditions, especially on the KUL dataset. However, on the DTU dataset, this only applies to some of the models, or when larger amounts of training data are provided, as in 10-fold cross-validation. This suggests that the scattering transform is capable of extracting additional relevant information.
Problem

Research questions and friction points this paper is trying to address.

cocktail party problem
auditory attention decoding
hearing aids
EEG-based decoding
speech separation
Innovation

Methods, ideas, or system contributions that make the work stand out.

scattering transform
auditory attention decoding
EEG preprocessing
cocktail party problem
neural network classification
🔎 Similar Papers
No similar papers found.
R
René Pallenberg
Institute for Signal Processing, University of Luebeck, 23562 Luebeck, Germany
F
Fabrice Katzberg
Institute for Signal Processing, University of Luebeck, 23562 Luebeck, Germany; and Department for Vision and Machine Learning, FPI Food Processing Innovation GmbH & Co. KG, 23556 Luebeck, Germany
Alfred Mertins
Alfred Mertins
University of Lübeck
M
Marco Maass
German Research Center for Artificial Intelligence (DFKI), AI for Assistive Health Technologies, 23562 Luebeck, Germany