SIREM: Speech-Informed MRI Reconstruction with Learned Sampling

📅 2026-05-18
📈 Citations: 0
✨ Influential: 0
📄 PDF

career value

221K/year
🤖 AI Summary
Real-time magnetic resonance imaging (rtMRI) for speech research is fundamentally constrained by the trade-off between spatial–temporal resolution and acquisition speed, leading to k-space undersampling and degraded reconstruction quality. This work proposes SIREM, a novel framework that, for the first time, leverages synchronized speech audio as a cross-modal prior to guide MRI reconstruction. SIREM integrates an audio-driven structural prediction network with a k-space-based MRI reconstruction branch, augmented by a learnable spiral k-space sampling strategy and a spatial soft-weighted fusion mechanism. This unified multimodal optimization enables joint audio-guided reconstruction and adaptive sampling. Evaluated on the USC speech rtMRI benchmark, SIREM significantly outperforms existing methods at high acceleration rates while preserving anatomically plausible vocal tract structures, establishing a new state of the art for multimodal, speech-informed MRI reconstruction.
📝 Abstract
Real-time magnetic resonance imaging (rtMRI) of speech production enables non-invasive visualization of dynamic vocal-tract motion and is valuable for speech science and clinical assessment. However, rtMRI is fundamentally constrained by trade-offs among spatial resolution, temporal resolution, and acquisition speed, often leading to undersampled k-space measurements and degraded reconstructions. We propose SIREM, a speech-informed MRI reconstruction framework that uses synchronized speech as a cross-modal prior. The central idea is that vocal-tract configurations during speech are correlated with the produced acoustics, making part of the image content predictable from audio. SIREM models each frame as a fusion of an audio-driven component and an MRI-driven component through a spatial weighting map. The audio branch predicts articulator-related structure from speech, while the MRI branch reconstructs complementary content from measured k-space data. We further introduce a learnable soft weighting profile over spiral arms, enabling a differentiable study of how k-space arm usage interacts with speech-informed fusion. This yields a unified multimodal formulation that combines audio-driven prediction, MRI reconstruction, and sampling adaptation. We evaluate SIREM on the USC speech rtMRI benchmark against standard baselines, including gridding, wavelet-based compressed sensing, and total variation. SIREM introduces a speech-informed reconstruction paradigm that operates in a substantially higher-throughput regime than iterative methods while preserving anatomically plausible vocal-tract structure. These results establish an initial benchmark for multimodal speech-informed rtMRI reconstruction and highlight the potential of synchronized speech as an auxiliary prior for fast reconstruction. The source code is available at https://github.com/mdhasanai/SIREM
Problem

Research questions and friction points this paper is trying to address.

real-time MRI
speech production
undersampled k-space
image reconstruction
vocal-tract imaging
Innovation

Methods, ideas, or system contributions that make the work stand out.

speech-informed MRI
multimodal reconstruction
learned k-space sampling
real-time MRI
cross-modal prior
🔎 Similar Papers
M
Md Hasan
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
N
Nyvenn Castro
Institute of Radiology, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
D
Daiqi Liu
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
L
Lukas Mulzer
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
Jana Hutter
Jana Hutter
UKER/FAU Erlangen // King's College London
Magnetic Resonance ImagingPerinatal ImagingQuantitative Imaging
Jonghye Woo
Jonghye Woo
Associate Professor of Radiology, Harvard Medical School | MGH
Medical Image AnalysisMedical ImagingComputer VisionMachine LearningSpeech
Moritz Zaiss
Moritz Zaiss
Ph.D.
MRICESTself learning MRPulseq
A
Andreas Maier
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
P
Paula A. PĂŠrez-Toro
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany