Open-Source Manually Annotated Vocal Tract Database for Automatic Segmentation from 3D MRI Using Deep Learning: Benchmarking 2D and 3D Convolutional and Transformer Networks

📅 2025-01-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the time-consuming, subjective, and low-reproducibility nature of manual 3D MRI segmentation of the laryngeal–oral vocal tract. We construct the first open-source, manually curated 3D anatomical vocal tract database. Methodologically, we systematically evaluate state-of-the-art CNNs (2D/3D U-Net, ResNet-based encoders) and Vision Transformers (SwinUNETR), incorporating multi-scale loss functions and advanced data augmentation. Experiments on cross-subject 3D segmentation demonstrate that 3D SwinUNETR achieves a Dice score of 92.7%, significantly outperforming 2D approaches—validating the superiority of 3D modeling and self-attention mechanisms for segmenting elongated, non-rigid vocal tract structures. Our key contributions are: (1) the first high-quality, open-source 3D vocal tract MRI segmentation dataset with expert anatomical annotations; and (2) the first systematic empirical evidence demonstrating the superior generalization capability of Transformer architectures in 3D speech anatomy segmentation.

Technology Category

Application Category

📝 Abstract
Accurate segmentation of the vocal tract from magnetic resonance imaging (MRI) data is essential for various voice and speech applications. Manual segmentation is time intensive and susceptible to errors. This study aimed to evaluate the efficacy of deep learning algorithms for automatic vocal tract segmentation from 3D MRI.
Problem

Research questions and friction points this paper is trying to address.

3D MRI Analysis
Automated Recognition
Phonetics Research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Learning
3D Convolutional Networks
Automated MRI Analysis
🔎 Similar Papers
No similar papers found.