The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning

📅 2024-06-06

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Cross-subject, cross-device, and cross-task speech decoding from neural activity faces severe data non-reusability due to inter-individual anatomical variability, divergent experimental paradigms, and heterogeneous acquisition hardware. Method: We propose a neuroscience-guided self-supervised learning framework featuring a novel neural mechanism-aligned pretraining objective, a dedicated decoder architecture, and a multi-center MEG data alignment strategy. Trained on ~400 hours of heterogeneous MEG data from 900 subjects across diverse sites and protocols, the model enables large-scale, scalable representation learning. Contribution/Results: Our approach overcomes subject-specific limitations, achieving—on non-invasive MEG data—speech decoding accuracy comparable to that of clinical invasive recordings for the first time. It improves cross-subject, cross-dataset, and cross-task generalization by 15–27%, establishing a new paradigm for universal brain–computer interfaces.

Technology Category

Application Category

📝 Abstract

The past few years have seen remarkable progress in the decoding of speech from brain activity, primarily driven by large single-subject datasets. However, due to individual variation, such as anatomy, and differences in task design and scanning hardware, leveraging data across subjects and datasets remains challenging. In turn, the field has not benefited from the growing number of open neural data repositories to exploit large-scale deep learning. To address this, we develop neuroscience-informed self-supervised objectives, together with an architecture, for learning from heterogeneous brain recordings. Scaling to nearly 400 hours of MEG data and 900 subjects, our approach shows generalisation across participants, datasets, tasks, and even to novel subjects. It achieves improvements of 15-27% over state-of-the-art models and matches surgical decoding performance with non-invasive data. These advances unlock the potential for scaling speech decoding models beyond the current frontier.

Problem

Research questions and friction points this paper is trying to address.

Brain Information Utilization

Deep Learning Performance Improvement

Language Decoding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Brain-inspired Learning Method

Non-invasive Data

Language Decoding Performance

🔎 Similar Papers

No similar papers found.