🤖 AI Summary
This work addresses a critical gap in conformal prediction by shifting focus from procedural fairness to substantive fairness in downstream decision-making. We propose a novel framework that integrates substantive fairness into conformal prediction through a label-clustering-based approach for constructing prediction sets, thereby enhancing outcome equity. To evaluate fairness in multimodal settings, we further develop an assessment mechanism assisted by large language models. Theoretically, we derive an interpretable decomposition of the upper bound on disparities in prediction set sizes across groups. Empirical results demonstrate that balancing prediction set sizes across subgroups yields more equitable decisions than merely ensuring marginal coverage, highlighting the importance of outcome-oriented fairness in conformal inference.
📝 Abstract
Conformal prediction (CP) offers distribution-free uncertainty quantification for machine learning models, yet its interplay with fairness in downstream decision-making remains underexplored. Moving beyond CP as a standalone operation (procedural fairness), we analyze the holistic decision-making pipeline to evaluate substantive fairness-the equity of downstream outcomes. Theoretically, we derive an upper bound that decomposes prediction-set size disparity into interpretable components, clarifying how label-clustered CP helps control method-driven contributions to unfairness. To facilitate scalable empirical analysis, we introduce an LLM-in-the-loop evaluator that approximates human assessment of substantive fairness across diverse modalities. Our experiments reveal that label-clustered CP variants consistently deliver superior substantive fairness. Finally, we empirically show that equalized set sizes, rather than coverage, strongly correlate with improved substantive fairness, enabling practitioners to design more fair CP systems. Our code is available at https://github.com/layer6ai-labs/llm-in-the-loop-conformal-fairness.