Intuitions of Machine Learning Researchers about Transfer Learning for Medical Image Classification

📅 2025-10-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study identifies a critical practice gap in medical image classification: machine learning researchers often rely on intuition—rather than systematic criteria—to select source datasets for transfer learning, potentially compromising model generalizability and clinical reliability. Method: Through a task-based human–computer interaction (HCI) study grounded in an established HCI analytical framework, we empirically investigate practitioners’ decision-making processes, challenging the conventional “source–target similarity maximization” assumption. Contribution/Results: We find a significant decoupling between perceived dataset similarity and actual downstream performance. Three key factors governing source selection emerge: community conventions, dataset meta-attributes (e.g., annotation granularity, imaging protocols), and computational embedding features. We distill implicit heuristic rules guiding selection and identify terminological ambiguity as a barrier to interdisciplinary collaboration. These findings provide theoretical foundations and design directions for developing interpretable, reproducible frameworks for source dataset selection in medical transfer learning.

Technology Category

Application Category

📝 Abstract

Transfer learning is crucial for medical imaging, yet the selection of source datasets - which can impact the generalizability of algorithms, and thus patient outcomes - often relies on researchers' intuition rather than systematic principles. This study investigates these decisions through a task-based survey with machine learning practitioners. Unlike prior work that benchmarks models and experimental setups, we take a human-centered HCI perspective on how practitioners select source datasets. Our findings indicate that choices are task-dependent and influenced by community practices, dataset properties, and computational (data embedding), or perceived visual or semantic similarity. However, similarity ratings and expected performance are not always aligned, challenging a traditional "more similar is better" view. Participants often used ambiguous terminology, which suggests a need for clearer definitions and HCI tools to make them explicit and usable. By clarifying these heuristics, this work provides practical insights for more systematic source selection in transfer learning.

Problem

Research questions and friction points this paper is trying to address.

Investigates how researchers select source datasets for medical imaging transfer learning

Challenges the traditional assumption that more similar datasets yield better performance

Identifies ambiguous terminology and need for clearer selection principles

Innovation

Methods, ideas, or system contributions that make the work stand out.

Surveyed ML practitioners' dataset selection heuristics

Identified task-dependent factors influencing transfer learning

Proposed HCI tools for explicit similarity definitions

🔎 Similar Papers

No similar papers found.