Detecting Dataset Bias in Medical AI: A Generalized and Modality-Agnostic Auditing Framework

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dataset bias in medical AI often induces shortcut learning and spurious correlations, undermining model fairness and robustness. To address this, we propose G-AUDIT—a general, modality-agnostic dataset bias auditing framework. G-AUDIT introduces the first unified, quantitative evaluation paradigm for “bias detectability” and “attribute utility,” enabling automated bias hypothesis generation and causal root-cause analysis across image, text, and tabular medical data. Methodologically, it integrates statistical dependency modeling, task-relevance decomposition, counterfactual sensitivity analysis, and causally inspired attribute perturbation coupled with prediction attribution. Evaluated on dermoscopic image classification, EHR stigmatizing language detection, and ICU mortality prediction, G-AUDIT identifies latent biases overlooked by manual review, significantly improving bias detection accuracy and fairness verification capability—thereby overcoming key limitations of conventional qualitative ethical assessments.

Technology Category

Application Category

📝 Abstract
Data-driven AI is establishing itself at the center of evidence-based medicine. However, reports of shortcomings and unexpected behavior are growing due to AI's reliance on association-based learning. A major reason for this behavior: latent bias in machine learning datasets can be amplified during training and/or hidden during testing. We present a data modality-agnostic auditing framework for generating targeted hypotheses about sources of bias which we refer to as Generalized Attribute Utility and Detectability-Induced bias Testing (G-AUDIT) for datasets. Our method examines the relationship between task-level annotations and data properties including protected attributes (e.g., race, age, sex) and environment and acquisition characteristics (e.g., clinical site, imaging protocols). G-AUDIT automatically quantifies the extent to which the observed data attributes may enable shortcut learning, or in the case of testing data, hide predictions made based on spurious associations. We demonstrate the broad applicability and value of our method by analyzing large-scale medical datasets for three distinct modalities and learning tasks: skin lesion classification in images, stigmatizing language classification in Electronic Health Records (EHR), and mortality prediction for ICU tabular data. In each setting, G-AUDIT successfully identifies subtle biases commonly overlooked by traditional qualitative methods that focus primarily on social and ethical objectives, underscoring its practical value in exposing dataset-level risks and supporting the downstream development of reliable AI systems. Our method paves the way for achieving deeper understanding of machine learning datasets throughout the AI development life-cycle from initial prototyping all the way to regulation, and creates opportunities to reduce model bias, enabling safer and more trustworthy AI systems.
Problem

Research questions and friction points this paper is trying to address.

Detects latent bias in medical AI datasets.
Quantifies shortcut learning and spurious associations.
Identifies overlooked biases in large-scale medical datasets.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality-agnostic framework for bias detection
Quantifies shortcut learning and spurious associations
Identifies subtle biases in diverse medical datasets
🔎 Similar Papers
No similar papers found.