🤖 AI Summary
This work addresses the performance disparities in automated chest CT diagnosis arising from class imbalance and underrepresentation of demographic subgroups. The authors propose a novel joint optimization framework that operates at both sample and group levels: at the sample level, a logit-adjusted cross-entropy loss corrects for class frequency bias, while at the group level, a Conditional Value-at-Risk (CVaR) aggregation mechanism prioritizes high-loss subgroups. This approach uniquely balances improved recognition of rare diseases with enhanced fairness for minority populations. Evaluated on the Fair Disease Diagnosis benchmark, the method achieves a gender-averaged macro F1 score of 0.8403 and reduces the fairness gap to 0.0239, yielding a 13.3% improvement in overall composite score and a 78% reduction in demographic disparity compared to baseline methods.
📝 Abstract
Automated diagnosis from chest CT has improved considerably with deep learning, but models trained on skewed datasets tend to perform unevenly across patient demographics. However, the situation is worse than simple demographic bias. In clinical data, class imbalance and group underrepresentation often coincide, creating compound failure modes that neither standard rebalancing nor fairness corrections can fix alone. We introduce a two-level objective that targets both axes of this problem. Logit-adjusted cross-entropy loss operates at the sample level, shifting decision margins by class frequency with provable consistency guarantees. Conditional Value at Risk aggregation operates at the group level, directing optimization pressure toward whichever demographic group currently has the higher loss. We evaluate on the Fair Disease Diagnosis benchmark using a 3D ResNet-18 pretrained on Kinetics-400, classifying CT volumes into Adenocarcinoma, Squamous Cell Carcinoma, COVID-19, and Normal groups with patient sex annotations. The training set illustrates the compound problem concretely: squamous cell carcinoma has 84 samples total, 5 of them female. The combined loss reaches a gender-averaged macro F1 of 0.8403 with a fairness gap of 0.0239, a 13.3% improvement in score and 78% reduction in demographic disparity over the baseline. Ablations show that each component alone falls short. The code is publicly available at https://github.com/Purdue-M2/Fair-Disease-Diagnosis.