🤖 AI Summary
This work addresses the critical oversight in existing automated machine learning (AutoML) systems, which typically prioritize predictive performance while neglecting fairness, potentially leading to discriminatory outcomes against specific demographic groups. To remedy this, the study proposes the first end-to-end fair-aware pipeline search framework that systematically integrates multidimensional fairness metrics throughout the entire AutoML workflow—including data selection, feature transformation, model selection, and hyperparameter tuning—via a multi-objective optimization approach that jointly optimizes both fairness and accuracy. Empirical results demonstrate that, compared to performance-only baselines, the proposed method improves fairness by 14.5% on average, reduces data usage by 35.7%, lowers model complexity, and incurs only a modest 9.4% degradation in predictive performance.
📝 Abstract
Machine Learning (ML) systems are increasingly used to support decision-making processes that affect individuals. However, these systems often rely on biased data, which can lead to unfair outcomes against specific groups. With the growing adoption of Automated Machine Learning (AutoML), the risk of intensifying discriminatory behaviours increases, as most frameworks primarily focus on model selection to maximise predictive performance. Previous research on fairness in AutoML had largely followed this trend, integrating fairness awareness only in the model selection or hyperparameter tuning, while neglecting other critical stages of the ML pipeline. This paper aims to study the impact of integrating fairness directly into the optimisation component of an AutoML framework that constructs complete ML pipelines, from data selection and transformations to model selection and tuning. As selecting appropriate fairness metrics remains a key challenge, our work incorporates complementary fairness metrics to capture different dimensions of fairness during the optimisation. Their integration within AutoML resulted in measurable differences compared to a baseline focused solely on predictive performance. Despite a 9.4% decrease in predictive power, the average fairness improved by 14.5%, accompanied by a 35.7% reduction in data usage. Furthermore, fairness integration produced complete yet simpler final solutions, suggesting that model complexity is not always required to achieve balanced and fair ML solutions.