🤖 AI Summary
Contemporary multimodal medical decision-making models face two interdependent challenges: imbalanced modality learning (e.g., disproportionate contributions from imaging versus textual data) and insufficient group fairness (e.g., performance disparities across gender or racial subgroups). To address these jointly, we propose a novel dual-level gradient modulation mechanism that simultaneously regulates gradient direction and magnitude at both the modality level and the subgroup level during optimization—thereby unifying multimodal balance and algorithmic fairness for the first time. Our approach employs gradient redirection and magnitude balancing to dynamically constrain multimodal learning trajectories, jointly optimizing classification accuracy and cross-group fairness. Evaluated on two benchmark multimodal medical datasets, our method achieves significant improvements over state-of-the-art approaches: classification accuracy increases by 2.3–4.1%, while inter-group performance gaps—measured by ΔEO and ΔDP—decrease by 37–52%.
📝 Abstract
Medical decision systems increasingly rely on data from multiple sources to ensure reliable and unbiased diagnosis. However, existing multimodal learning models fail to achieve this goal because they often ignore two critical challenges. First, various data modalities may learn unevenly, thereby converging to a model biased towards certain modalities. Second, the model may emphasize learning on certain demographic groups causing unfair performances. The two aspects can influence each other, as different data modalities may favor respective groups during optimization, leading to both imbalanced and unfair multimodal learning. This paper proposes a novel approach called MultiFair for multimodal medical classification, which addresses these challenges with a dual-level gradient modulation process. MultiFair dynamically modulates training gradients regarding the optimization direction and magnitude at both data modality and group levels. We conduct extensive experiments on two multimodal medical datasets with different demographic groups. The results show that MultiFair outperforms state-of-the-art multimodal learning and fairness learning methods.