🤖 AI Summary
Deep neural networks (DNNs) often yield unfair predictions with respect to sensitive attributes (e.g., race, gender) due to biases inherent in training data—posing significant risks in high-stakes decision-making. To address this, we propose the first fairness-aware method that integrates fairness considerations directly into neuron-level fault localization and repair. Our approach leverages input-output relationship analysis to design a fairness-driven neuron importance scoring mechanism, enabling precise identification and weighted adjustment of neuron weights strongly correlated with sensitive attributes—thereby optimizing fairness metrics such as equalized odds. Evaluated across multiple image and tabular benchmarks, our method significantly outperforms state-of-the-art fairness repair techniques: it improves subgroup fairness more effectively while preserving model accuracy, and offers both computational efficiency and interpretability through transparent, neuron-level interventions.
📝 Abstract
Deep neural networks (DNNs) are being utilized in various aspects of our daily lives, including high-stakes decision-making applications that impact individuals. However, these systems reflect and amplify bias from the data used during training and testing, potentially resulting in biased behavior and inaccurate decisions. For instance, having different misclassification rates between white and black sub-populations. However, effectively and efficiently identifying and correcting biased behavior in DNNs is a challenge. This paper introduces FairFLRep, an automated fairness-aware fault localization and repair technique that identifies and corrects potentially bias-inducing neurons in DNN classifiers. FairFLRep focuses on adjusting neuron weights associated with sensitive attributes, such as race or gender, that contribute to unfair decisions. By analyzing the input-output relationships within the network, FairFLRep corrects neurons responsible for disparities in predictive quality parity. We evaluate FairFLRep on four image classification datasets using two DNN classifiers, and four tabular datasets with a DNN model. The results show that FairFLRep consistently outperforms existing methods in improving fairness while preserving accuracy. An ablation study confirms the importance of considering fairness during both fault localization and repair stages. Our findings also show that FairFLRep is more efficient than the baseline approaches in repairing the network.