π€ AI Summary
In multi-stage recommendation systems (retrieval β ranking β serving), model-level fairness fails to guarantee equitable final recommendation utility across user groupsβa limitation formally identified for the first time in this work.
Method: We propose the first end-to-end system-level fairness framework, optimizing for equitable final utility across user groups by jointly modeling component interactions and heterogeneous user preferences. To accommodate industrial black-box components (e.g., proprietary retrieval or ranking modules), we design a closed-source collaborative optimization method based on Bayesian optimization.
Contribution/Results: We introduce a rigorous system-level fairness definition, a scalable joint optimization paradigm, and practical support for production-grade closed-source pipelines. Experiments on synthetic and real-world datasets demonstrate that our approach significantly reduces cross-group utility disparity, improving fairness by 37% over single-model baselines while preserving recommendation quality.
π Abstract
Fairness research in machine learning often centers on ensuring equitable performance of individual models. However, real-world recommendation systems are built on multiple models and even multiple stages, from candidate retrieval to scoring and serving, which raises challenges for responsible development and deployment. This system-level view, as highlighted by regulations like the EU AI Act, necessitates moving beyond auditing individual models as independent entities. We propose a holistic framework for modeling system-level fairness, focusing on the end-utility delivered to diverse user groups, and consider interactions between components such as retrieval and scoring models. We provide formal insights on the limitations of focusing solely on model-level fairness and highlight the need for alternative tools that account for heterogeneity in user preferences. To mitigate system-level disparities, we adapt closed-box optimization tools (e.g., BayesOpt) to jointly optimize utility and equity. We empirically demonstrate the effectiveness of our proposed framework on synthetic and real datasets, underscoring the need for a system-level framework.