🤖 AI Summary
This study identifies a critical practice gap in medical image classification: machine learning researchers often rely on intuition—rather than systematic criteria—to select source datasets for transfer learning, potentially compromising model generalizability and clinical reliability. Method: Through a task-based human–computer interaction (HCI) study grounded in an established HCI analytical framework, we empirically investigate practitioners’ decision-making processes, challenging the conventional “source–target similarity maximization” assumption. Contribution/Results: We find a significant decoupling between perceived dataset similarity and actual downstream performance. Three key factors governing source selection emerge: community conventions, dataset meta-attributes (e.g., annotation granularity, imaging protocols), and computational embedding features. We distill implicit heuristic rules guiding selection and identify terminological ambiguity as a barrier to interdisciplinary collaboration. These findings provide theoretical foundations and design directions for developing interpretable, reproducible frameworks for source dataset selection in medical transfer learning.
📝 Abstract
Transfer learning is crucial for medical imaging, yet the selection of source datasets - which can impact the generalizability of algorithms, and thus patient outcomes - often relies on researchers' intuition rather than systematic principles. This study investigates these decisions through a task-based survey with machine learning practitioners. Unlike prior work that benchmarks models and experimental setups, we take a human-centered HCI perspective on how practitioners select source datasets. Our findings indicate that choices are task-dependent and influenced by community practices, dataset properties, and computational (data embedding), or perceived visual or semantic similarity. However, similarity ratings and expected performance are not always aligned, challenging a traditional "more similar is better" view. Participants often used ambiguous terminology, which suggests a need for clearer definitions and HCI tools to make them explicit and usable. By clarifying these heuristics, this work provides practical insights for more systematic source selection in transfer learning.