🤖 AI Summary
This work addresses the issue of uneven performance across demographic or clinical subgroups in existing medical image segmentation models, which often overlook hard examples within subgroups—leading to “hidden intra-subgroup failures.” To tackle this, the authors propose DuetFair, a dual-axis fairness framework that jointly models inter-subgroup adaptability and intra-subgroup robustness for the first time. The framework introduces FairDRO, integrating a distribution-aware mixture-of-experts (dMoE) architecture with a subgroup-conditional distributionally robust optimization (DRO) loss aggregation strategy. Evaluated on three medical image segmentation benchmarks, the method significantly improves worst-subgroup performance, achieving up to a 4.1-point (↑7.4%) increase in Dice score on the worst-performing subgroup in a 3D radiotherapy dataset, thereby enabling finer-grained fairness guarantees.
📝 Abstract
Medical image segmentation models can perform unevenly across subgroups. Most existing fairness methods focus on improving average subgroup performance, implicitly treating each subgroup as internally homogeneous. However, this can hide difficult cases within a subgroup, where high-loss samples are obscured by the subgroup mean. We call this problem \textbf{intra-group hidden failure}. To solve this, we propose \textbf{DuetFair} mechanism, a dual-axis fairness framework that jointly considers inter-subgroup adaptation and intra-subgroup robustness. Based on DuetFair, we introduce \textbf{FairDRO}, which combines distribution-aware mixture-of-experts (dMoE) with subgroup-conditioned distributionally robust optimization (DRO) loss aggregation. This design allows the model to adapt across subgroups while also reducing hidden failures within each subgroup. We evaluate FairDRO on three medical image segmentation benchmarks with varying degrees of within-group heterogeneity. FairDRO achieves the best equity-scaled performance on Harvard-FairSeg and improves worst-case subgroup performance on HAM10000 under both age- and race-based grouping schemes. On the 3D radiotherapy target cohort, FairDRO further improves worst-group Dice by 3.5 points ($\uparrow 6.0\%$) under the tumor-stage grouping and by 4.1 points ($\uparrow 7.4\%$) under the institution grouping over the strongest baseline.