Lost in Distortion: Uncovering the Domain Gap Between Computer Vision and Brain Imaging - A Study on Pretraining for Age Prediction

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study investigates how data quality heterogeneity in neuroimaging affects self-supervised pretraining: whether low-quality scans (e.g., motion artifacts, signal dropout) provide useful supervisory signals or impair representation learning. We propose a hierarchical quality-aware contrastive pretraining framework and systematically evaluate pretraining efficacy across multi-level quality tiers of brain MRI data, followed by fine-tuning on external cohorts to assess generalizability. Results show that high-quality data substantially improves downstream brain age prediction accuracy (reducing MAE by 12.7%), whereas incorporating low-quality samples degrades representation robustness. Crucially, we uncover fundamental differences between clinical neuroimaging and general computer vision regarding noise tolerance and domain transfer mechanisms. To our knowledge, this is the first work to quantitatively demonstrate that domain-adapted data curation is essential for building trustworthy foundation models in neuroimaging—providing both theoretical grounding and practical guidelines for medical AI pretraining paradigms.

Technology Category

Application Category

📝 Abstract

Large-scale brain imaging datasets provide unprecedented opportunities for developing domain foundation models through pretraining. However, unlike natural image datasets in computer vision, these neuroimaging data often exhibit high heterogeneity in quality, ranging from well-structured scans to severely distorted or incomplete brain volumes. This raises a fundamental question: can noise or low-quality scans contribute meaningfully to pretraining, or do they instead hinder model learning? In this study, we systematically explore the role of data quality level in pretraining and its impact on downstream tasks. Specifically, we perform pretraining on datasets with different quality levels and perform fine-tuning for brain age prediction on external cohorts. Our results show significant performance differences across quality levels, revealing both opportunities and limitations. We further discuss the gap between computer vision practices and clinical neuroimaging standards, emphasizing the necessity of domain-aware curation to ensure trusted and generalizable domain-specific foundation models.

Problem

Research questions and friction points this paper is trying to address.

Explores how data quality affects pretraining for brain age prediction

Investigates if noisy scans help or hinder model learning in neuroimaging

Examines domain gap between computer vision and clinical imaging standards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pretraining with varied quality neuroimaging data

Systematic analysis of data quality impact on tasks

Domain-aware curation for clinical neuroimaging models

🔎 Similar Papers

No similar papers found.