🤖 AI Summary
In high-stakes domains, identifying subpopulations on which machine learning models exhibit anomalous performance—either significantly better or worse than expected—remains challenging. Method: This paper proposes mSMoPE, the first framework to integrate calibrated prediction with statistical coverage guarantees into anomalous model mining, enabling interpretable localization of performance anomalies in both multi-class classification and regression tasks. It unifies calibrated prediction, soft model performance evaluation, uncertainty quantification, and anomaly detection into an end-to-end pipeline and introduces RAUL, a novel quality metric quantifying the severity of subgroup-level performance anomalies. Contribution/Results: Evaluated on multiple real-world datasets, mSMoPE accurately identifies semantically meaningful subpopulations with statistically significant performance deviations. It substantially enhances model reliability and interpretability, establishing a new paradigm for model diagnostics in safety-critical applications.
📝 Abstract
Understanding the nuanced performance of machine learning models is essential for responsible deployment, especially in high-stakes domains like healthcare and finance. This paper introduces a novel framework, Conformalized Exceptional Model Mining, which combines the rigor of Conformal Prediction with the explanatory power of Exceptional Model Mining (EMM). The proposed framework identifies cohesive subgroups within data where model performance deviates exceptionally, highlighting regions of both high confidence and high uncertainty. We develop a new model class, mSMoPE (multiplex Soft Model Performance Evaluation), which quantifies uncertainty through conformal prediction's rigorous coverage guarantees. By defining a new quality measure, Relative Average Uncertainty Loss (RAUL), our framework isolates subgroups with exceptional performance patterns in multi-class classification and regression tasks. Experimental results across diverse datasets demonstrate the framework's effectiveness in uncovering interpretable subgroups that provide critical insights into model behavior. This work lays the groundwork for enhancing model interpretability and reliability, advancing the state-of-the-art in explainable AI and uncertainty quantification.