🤖 AI Summary
This work addresses air-tissue boundary (ATB) segmentation in low-resource real-time MRI (rtMRI) videos for vocal tract motion analysis. We investigate the impact of pretraining and fine-tuning dataset scale on segmentation performance. We propose a “15-frame cross-domain adaptation” paradigm, enabling lightweight model adaptation using only 15 target-domain rtMRI frames. An evaluation framework is established, benchmarking performance against matched-condition baselines. Using SegNet/UNet architectures, we employ staged pretraining followed by pixel-wise supervised fine-tuning, with performance quantified via PCA-based reconstruction fidelity and Dice score. On same-source novel subjects, our method surpasses the matched baseline by 0.33% (PCA) and 0.91% (Dice); on cross-source data, it achieves 99.63% (PCA) and 98.09% (Dice) of the baseline performance—demonstrating strong generalization from minimal target samples. Our core contributions are: (i) establishing a data-efficiency lower bound for low-resource rtMRI segmentation, and (ii) proposing a reproducible, condition-matched evaluation standard.
📝 Abstract
Real-time Magnetic Resonance Imaging (rtMRI) is frequently used in speech production studies as it provides a complete view of the vocal tract during articulation. This study investigates the effectiveness of rtMRI in analyzing vocal tract movements by employing the SegNet and UNet models for Air-Tissue Boundary (ATB)segmentation tasks. We conducted pretraining of a few base models using increasing numbers of subjects and videos, to assess performance on two datasets. First, consisting of unseen subjects with unseen videos from the same data source, achieving 0.33% and 0.91% (Pixel-wise Classification Accuracy (PCA) and Dice Coefficient respectively) better than its matched condition. Second, comprising unseen videos from a new data source, where we obtained an accuracy of 99.63% and 98.09% (PCA and Dice Coefficient respectively) of its matched condition performance. Here, matched condition performance refers to the performance of a model trained only on the test subjects which was set as a benchmark for the other models. Our findings highlight the significance of fine-tuning and adapting models with limited data. Notably, we demonstrated that effective model adaptation can be achieved with as few as 15 rtMRI frames from any new dataset.