🤖 AI Summary
Existing medical imaging datasets lack dense, diagnosis-reasoning–oriented attribute annotations, severely hindering the development and evaluation of eXplainable AI (xAI) models.
Method: We propose the first parameterized synthetic medical image dataset framework designed specifically for xAI, using controllably generated abstract lung nodule–like shapes to precisely model visual attributes—including shape, edge sharpness, and spiculation—and their explicit, rule-based mappings to diagnostic labels.
Contribution/Results: The framework enables full-dimensional, decoupled control over diagnostic logic, attribute composition, and data complexity, supporting model-agnostic, attribute-level attribution evaluation and attention-alignment analysis. Experiments demonstrate its capability to accurately determine whether models base decisions on semantically correct features. It provides a reproducible, scalable benchmark for evaluating diverse xAI methods—enabling rigorous, fine-grained assessment of interpretability mechanisms in medical AI.
📝 Abstract
Densely annotated medical image datasets that capture not only diagnostic labels but also the underlying reasoning behind these diagnoses are scarce. Such reasoning-related annotations are essential for developing and evaluating explainable AI (xAI) models that reason similarly to radiologists: making correct predictions for the right reasons. To address this gap, we introduce FunnyNodules, a fully parameterized synthetic dataset designed for systematic analysis of attribute-based reasoning in medical AI models. The dataset generates abstract, lung nodule-like shapes with controllable visual attributes such as roundness, margin sharpness, and spiculation. Target class is derived from a predefined attribute combination, allowing full control over the decision rule that links attributes to the diagnostic class. We demonstrate how FunnyNodules can be used in model-agnostic evaluations to assess whether models learn correct attribute-target relations, to interpret over- or underperformance in attribute prediction, and to analyze attention alignment with attribute-specific regions of interest. The framework is fully customizable, supporting variations in dataset complexity, target definitions, class balance, and beyond. With complete ground truth information, FunnyNodules provides a versatile foundation for developing, benchmarking, and conducting in-depth analyses of explainable AI methods in medical image analysis.