Brain-to-Speech: Prosody Feature Engineering and Transformer-Based Reconstruction

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the communication needs of individuals with speech impairments by proposing a high-fidelity speech reconstruction method based on intracranial electroencephalography (iEEG). The approach systematically extracts prosodic features—such as intonation, pitch, and rhythm—from iEEG signals and introduces a novel Transformer-based encoder architecture that explicitly incorporates prosodic information. This is the first work to achieve prosody-driven natural speech synthesis in brain-to-speech tasks. Experimental results demonstrate that the proposed method significantly outperforms baseline models, including Griffin–Lim and CNN-based approaches, both in objective metrics and subjective listening evaluations. The synthesized speech exhibits markedly improved intelligibility and expressiveness, highlighting the critical role of neural prosody encoding in reconstructing naturalistic vocal output from brain activity.
📝 Abstract
This chapter presents a novel approach to brain-to-speech (BTS) synthesis from intracranial electroencephalography (iEEG) data, emphasizing prosody-aware feature engineering and advanced transformer-based models for high-fidelity speech reconstruction. Driven by the increasing interest in decoding speech directly from brain activity, this work integrates neuroscience, artificial intelligence, and signal processing to generate accurate and natural speech. We introduce a novel pipeline for extracting key prosodic features directly from complex brain iEEG signals, including intonation, pitch, and rhythm. To effectively utilize these crucial features for natural-sounding speech, we employ advanced deep learning models. Furthermore, this chapter introduces a novel transformer encoder architecture specifically designed for brain-to-speech tasks. Unlike conventional models, our architecture integrates the extracted prosodic features to significantly enhance speech reconstruction, resulting in generated speech with improved intelligibility and expressiveness. A detailed evaluation demonstrates superior performance over established baseline methods, such as traditional Griffin-Lim and CNN-based reconstruction, across both quantitative and perceptual metrics. By demonstrating these advancements in feature extraction and transformer-based learning, this chapter contributes to the growing field of AI-driven neuroprosthetics, paving the way for assistive technologies that restore communication for individuals with speech impairments. Finally, we discuss promising future research directions, including the integration of diffusion models and real-time inference systems.
Problem

Research questions and friction points this paper is trying to address.

brain-to-speech
prosody
speech reconstruction
iEEG
neuroprosthetics
Innovation

Methods, ideas, or system contributions that make the work stand out.

prosody-aware feature engineering
transformer-based reconstruction
brain-to-speech
intracranial EEG
neural speech synthesis
🔎 Similar Papers
No similar papers found.
M
Mohammed Salah Al-Radhi
Department of Telecommunications and Artificial Intelligence, Budapest University of Technology and Economics, Budapest, Hungary
G
Géza Németh
Department of Telecommunications and Artificial Intelligence, Budapest University of Technology and Economics, Budapest, Hungary
Andon Tchechmedjiev
Andon Tchechmedjiev
Associate Professor at IMT Mines Alès / EuroMov DHM
Biomedical InformaticsComputational Lexical SemanticsKnowledge EngineeringMultimodal Representation LearningHuman Moveme
Binbin Xu
Binbin Xu
HUAWEI Noah's Ark Lab
SLAMRoboticsComputer Vision