Disentanglement and Assessment of Shortcuts in Ophthalmological Retinal Imaging Exams

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses algorithmic fairness in AI-based diabetic retinopathy (DR) screening. Using the mBRSET fundus image dataset, we systematically evaluate the trade-off between predictive performance (up to 94% AUROC) and bias with respect to sensitive attributes—age and sex—across three state-of-the-art vision architectures: ConvNeXt V2, DINOv2, and Swin V2. We first uncover architecture-specific entanglement patterns between disease-relevant features and sensitive attributes—a novel finding. Then, applying feature disentanglement techniques, we demonstrate that fairness improvement is highly architecture-dependent: DINOv2 achieves both enhanced fairness and a 2% AUROC gain post-disentanglement, whereas ConvNeXt V2 and Swin V2 suffer performance degradation. Our work provides empirical evidence and establishes an architecture-aware paradigm for fairness optimization in medical AI, bridging theoretical fairness interventions with practical model design choices.

Technology Category

Application Category

📝 Abstract
Diabetic retinopathy (DR) is a leading cause of vision loss in working-age adults. While screening reduces the risk of blindness, traditional imaging is often costly and inaccessible. Artificial intelligence (AI) algorithms present a scalable diagnostic solution, but concerns regarding fairness and generalization persist. This work evaluates the fairness and performance of image-trained models in DR prediction, as well as the impact of disentanglement as a bias mitigation technique, using the diverse mBRSET fundus dataset. Three models, ConvNeXt V2, DINOv2, and Swin V2, were trained on macula images to predict DR and sensitive attributes (SAs) (e.g., age and gender/sex). Fairness was assessed between subgroups of SAs, and disentanglement was applied to reduce bias. All models achieved high DR prediction performance in diagnosing (up to 94% AUROC) and could reasonably predict age and gender/sex (91% and 77% AUROC, respectively). Fairness assessment suggests disparities, such as a 10% AUROC gap between age groups in DINOv2. Disentangling SAs from DR prediction had varying results, depending on the model selected. Disentanglement improved DINOv2 performance (2% AUROC gain), but led to performance drops in ConvNeXt V2 and Swin V2 (7% and 3%, respectively). These findings highlight the complexity of disentangling fine-grained features in fundus imaging and emphasize the importance of fairness in medical imaging AI to ensure equitable and reliable healthcare solutions.
Problem

Research questions and friction points this paper is trying to address.

Evaluates fairness and performance of AI in diabetic retinopathy prediction
Assesses impact of disentanglement to mitigate bias in medical imaging
Explores disparities in model performance across sensitive attributes
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI models predict diabetic retinopathy from macula images
Disentanglement technique reduces bias in DR prediction
Fairness assessment reveals disparities across sensitive attributes
🔎 Similar Papers
No similar papers found.
L
Leonor Fernandes
Faculdade de Engenharia, Universidade do Porto, Portugal; Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
Tiago Gonçalves
Tiago Gonçalves
Faculdade de Engenharia, Universidade do Porto, Portugal; Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal
João Matos
João Matos
University of Oxford
Machine LearningFairnessMedical StatisticsArtificial Intelligence
Luis Filipe Nakayama
Luis Filipe Nakayama
Visiting Student, Massachusetts Institute of Technology
OphthalmologyRetinaArtificial IntelligenceData Science
J
Jaime S. Cardoso
Faculdade de Engenharia, Universidade do Porto, Portugal; Institute for Systems and Computer Engineering, Technology and Science, Porto, Portugal