From Minutes to Days: Scaling Intracranial Speech Decoding with Supervised Pretraining

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Speech decoding in brain–computer interfaces (BCIs) is limited by scarce training data and cross-day neural representation drift in intracranial electrocorticography (ECoG). Method: Leveraging the first large-scale, week-long clinical dataset of synchronized ECoG and speech recordings—increasing training samples over 100-fold—we propose: (1) a supervised pretraining paradigm for end-to-end speech decoding initialization using real clinical neural signals; (2) explicit modeling of cross-day neural drift via a contrastive learning architecture, empirically quantifying this variability; and (3) cross-day temporal alignment for joint ECoG–speech representation learning. Results: Our approach significantly outperforms models trained on short-term experimental data, with performance gains scaling logarithmically with data volume. It achieves robust, high-fidelity speech reconstruction in realistic clinical settings, establishing a new benchmark for clinical BCI speech decoding.

Technology Category

Application Category

📝 Abstract
Decoding speech from brain activity has typically relied on limited neural recordings collected during short and highly controlled experiments. Here, we introduce a framework to leverage week-long intracranial and audio recordings from patients undergoing clinical monitoring, effectively increasing the training dataset size by over two orders of magnitude. With this pretraining, our contrastive learning model substantially outperforms models trained solely on classic experimental data, with gains that scale log-linearly with dataset size. Analysis of the learned representations reveals that, while brain activity represents speech features, its global structure largely drifts across days, highlighting the need for models that explicitly account for cross-day variability. Overall, our approach opens a scalable path toward decoding and modeling brain representations in both real-life and controlled task settings.
Problem

Research questions and friction points this paper is trying to address.

Decodes speech from week-long intracranial recordings
Uses pretraining to scale neural speech decoding
Addresses cross-day variability in brain activity representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging week-long intracranial and audio recordings for training
Using contrastive learning model with supervised pretraining
Explicitly accounting for cross-day variability in brain activity
🔎 Similar Papers
No similar papers found.
L
Linnea Evanson
Hospital Foundation Adolphe De Rothschild, Paris, France
Mingfang Zhang
Mingfang Zhang
The University of Tokyo
Computer Vision
Hubert Banville
Hubert Banville
FAIR, AI at Meta
machine learningbrain decodingbiosignals
S
Saarang Panchavati
Meta AI
P
Pierre Bourdillon
Hospital Foundation Adolphe De Rothschild, Paris, France; Integrative Neuroscience & Cognition Center, Paris Cité University, Paris, France
Jean-Rémi King
Jean-Rémi King
Meta
neuroscienceartificial intelligencehuman cognitiondecoding