🤖 AI Summary
This study addresses the challenge of end-to-end reconstruction of high-fidelity, polyphonic, harmonically rich natural music directly from raw, non-invasive EEG signals. Methodologically, it introduces latent diffusion models (LDMs) to the EEG-to-audio decoding task for the first time, enabling direct mapping from raw temporal EEG to complex audio waveforms—without handcrafted features, channel selection, or signal preprocessing. A novel neural-embedding-based evaluation metric is proposed to better quantify auditory perceptual consistency. Experiments on the NMED-T dataset demonstrate substantial improvements in timbral and structural fidelity; reconstructed audio exhibits superior intelligibility and musicality compared to conventional linear models and VAE baselines. Key contributions include: (1) the first LDM framework explicitly designed for natural-music synthesis from EEG; (2) a full-brain, end-to-end decoding paradigm that maps raw EEG to high-quality audio; and (3) a semantics-aware evaluation framework tailored for neural decoding.
📝 Abstract
In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.