Reconciling Predictive Multiplicity in Practice

📅 2025-01-27

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Model multiplicity in machine learning leads to inconsistent probabilistic predictions, fairness assessments, and causal inferences. Method: This paper introduces the first disagreement reconciliation framework tailored for causal effect estimation—specifically, conditional average treatment effect (CATE) estimation. Contributions/Results: (1) It extends the Reconcile algorithm to CATE estimation, enabling interpretable reconciliation across multiple estimators; (2) it proposes a novel “disagreement-driven model falsification and iterative refinement” paradigm; and (3) it designs a disagreement-aware dynamic calibration method, empirically validated across five fairness benchmarks (COMPAS, Adult, ACS, etc.). Theoretical analysis establishes convergence under causal assumptions. Empirical results show a 32% reduction in CATE prediction disagreement on real-world data, significantly improving consistency and trustworthiness in high-stakes applications such as clinical risk prediction.

Technology Category

Application Category

📝 Abstract

Many machine learning applications predict individual probabilities, such as the likelihood that a person develops a particular illness. Since these probabilities are unknown, a key question is how to address situations in which different models trained on the same dataset produce varying predictions for certain individuals. This issue is exemplified by the model multiplicity (MM) phenomenon, where a set of comparable models yield inconsistent predictions. Roth, Tolbert, and Weinstein recently introduced a reconciliation procedure, the Reconcile algorithm, to address this problem. Given two disagreeing models, the algorithm leverages their disagreement to falsify and improve at least one of the models. In this paper, we empirically analyze the Reconcile algorithm using five widely-used fairness datasets: COMPAS, Communities and Crime, Adult, Statlog (German Credit Data), and the ACS Dataset. We examine how Reconcile fits within the model multiplicity literature and compare it to existing MM solutions, demonstrating its effectiveness. We also discuss potential improvements to the Reconcile algorithm theoretically and practically. Finally, we extend the Reconcile algorithm to the setting of causal inference, given that different competing estimators can again disagree on specific causal average treatment effect (CATE) values. We present the first extension of the Reconcile algorithm in causal inference, analyze its theoretical properties, and conduct empirical tests. Our results confirm the practical effectiveness of Reconcile and its applicability across various domains.

Problem

Research questions and friction points this paper is trying to address.

Machine Learning

Model Multiplicity

Inconsistent Predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconciliation Algorithm

Model Multiplicity

Causal Inference

🔎 Similar Papers

Perceptions of the Fairness Impacts of Multiplicity in Machine Learning