🤖 AI Summary
Evaluating fairness in machine learning systems faces challenges including ambiguous metric definitions and the difficulty of quantifying trade-offs between utility and fairness. This paper proposes a model-agnostic, scalable multi-objective evaluation framework that unifies the characterization of utility–fairness trade-offs across multiple fairness dimensions—such as group and individual fairness—and supports model comparison and decision-making under single or multiple constraints. Innovatively integrating convergence, system capacity, and diversity, the framework introduces a joint visualization paradigm combining radar charts and calibrated metric scales. It also features a modular evaluation interface compatible with mainstream ML models. Extensive experiments on synthetic and real-world benchmark datasets demonstrate that the framework significantly enhances interpretability of evaluation outcomes, cross-model comparability, and decision-support capability.
📝 Abstract
The evaluation of fairness models in Machine Learning involves complex challenges, such as defining appropriate metrics, balancing trade-offs between utility and fairness, and there are still gaps in this stage. This work presents a novel multi-objective evaluation framework that enables the analysis of utility-fairness trade-offs in Machine Learning systems. The framework was developed using criteria from Multi-Objective Optimization that collect comprehensive information regarding this complex evaluation task. The assessment of multiple Machine Learning systems is summarized, both quantitatively and qualitatively, in a straightforward manner through a radar chart and a measurement table encompassing various aspects such as convergence, system capacity, and diversity. The framework's compact representation of performance facilitates the comparative analysis of different Machine Learning strategies for decision-makers, in real-world applications, with single or multiple fairness requirements. The framework is model-agnostic and flexible to be adapted to any kind of Machine Learning systems, that is, black- or white-box, any kind and quantity of evaluation metrics, including multidimensional fairness criteria. The functionality and effectiveness of the proposed framework is shown with different simulations, and an empirical study conducted on a real-world dataset with various Machine Learning systems.