"Who experiences large model decay and why?"A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift

📅 2025-05-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
When deploying machine learning models across heterogeneous environments, performance degradation often exhibits subgroup-specific heterogeneity—yet existing methods either explain only mean-level distributional shifts or isolate vulnerable subgroups without jointly identifying *where* degradation occurs and *why* it arises. This paper introduces SHIFT, the first hierarchical inference framework that unifies subgroup scanning, hierarchical causal inference, variable subset sensitivity analysis, and interpretable shift attribution. SHIFT simultaneously enables precise identification of degraded subgroups and disentanglement of underlying causes—distinguishing covariate shift from outcome shift. Evaluated on real-world deployments, SHIFT generates human-interpretable attributions of performance degradation and guides targeted interventions: it significantly improves performance for affected subgroups while avoiding negative transfer to others.

Technology Category

Application Category

📝 Abstract
Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing targeted corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how average performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a Subgroup-scanning Hierarchical Inference Framework for performance drifT (SHIFT). SHIFT first asks"Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?"(Where?) and, if so, dives deeper to ask"Can we explain this using more detailed variable(subset)-specific shifts?"(How?). In real-world experiments, we find that SHIFT identifies interpretable subgroups affected by performance decay, and suggests targeted actions that effectively mitigate the decay.
Problem

Research questions and friction points this paper is trying to address.

Identifies subgroups with significant performance decay
Explains causes of decay via variable-specific shifts
Proposes targeted actions to mitigate performance degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical framework for performance drift diagnosis
Subgroup-scanning to identify affected subgroups
Variable-specific shift analysis for targeted mitigation
🔎 Similar Papers
No similar papers found.
Harvineet Singh
Harvineet Singh
University of California, San Francisco, USA
F
Fan Xia
University of California, San Francisco, USA
A
Alexej Gossmann
Independent researcher
A
Andrew Chuang
University of California, San Francisco, USA
J
Julian C. Hong
University of California, San Francisco, USA
Jean Feng
Jean Feng
Department of Epidemiology and Biostatistics, University of California, San Francisco
machine learningstatisticsbiostatistics