Fairness Evolution in Continual Learning for Medical Imaging

📅 2024-04-10

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study investigates the evolution of fairness in continual learning (CL) for medical imaging, specifically examining how predictive bias toward sensitive subpopulations—such as age and gender groups—dynamically shifts during multi-task incremental learning on chest X-rays. It introduces domain-adapted fairness metrics into CL evaluation for the first time, conducting a systematic comparison of four CL strategies—Replay, Learning without Forgetting (LwF), LwF+Replay, and Pseudo-Label—across a five-task, twelve-pathology classification setup using CheXpert and ChestX-ray14. Results demonstrate that Pseudo-Label achieves classification accuracy comparable to state-of-the-art methods while significantly reducing inter-group predictive disparity, revealing an inherent trade-off between performance and fairness in CL. This work establishes a novel evaluation paradigm and provides empirical evidence for developing clinically deployable fair continual learning systems.

Technology Category

Application Category

📝 Abstract

Deep Learning (DL) has made significant strides in various medical applications in recent years, achieving remarkable results. In the field of medical imaging, DL models can assist doctors in disease diagnosis by classifying pathologies in Chest X-ray images. However, training on new data to expand model capabilities and adapt to distribution shifts is a notable challenge these models face. Continual Learning (CL) has emerged as a solution to this challenge, enabling models to adapt to new data while retaining knowledge gained from previous experiences. Previous studies have analyzed the behavior of CL strategies in medical imaging regarding classification performance. However, when considering models that interact with sensitive information, such as in the medical domain, it is imperative to disaggregate the performance of socially salient groups. Indeed, DL algorithms can exhibit biases against certain sub-populations, leading to discrepancies in predictive performance across different groups identified by sensitive attributes such as age, race/ethnicity, sex/gender, and socioeconomic status. In this study, we go beyond the typical assessment of classification performance in CL and study bias evolution over successive tasks with domain-specific fairness metrics. Specifically, we evaluate the CL strategies using the well-known CheXpert (CXP) and ChestX-ray14 (NIH) datasets. We consider a class incremental scenario of five tasks with 12 pathologies. We evaluate the Replay, Learning without Forgetting (LwF), LwF Replay, and Pseudo-Label strategies. LwF and Pseudo-Label exhibit optimal classification performance, but when including fairness metrics in the evaluation, it is clear that Pseudo-Label is less biased. For this reason, this strategy should be preferred when considering real-world scenarios in which it is crucial to consider the fairness of the model.

Problem

Research questions and friction points this paper is trying to address.

Examines bias evolution in continual learning for medical imaging

Evaluates fairness impact of different continual learning strategies

Compares classification performance and bias in healthcare applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses continual learning for medical imaging

Evaluates fairness across social groups

Compares Learning without Forgetting and Pseudo-Label

🔎 Similar Papers

No similar papers found.