"Who experiences large model decay and why?"A Hierarchical Framework for Diagnosing Heterogeneous Performance Drift

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

When deploying machine learning models across heterogeneous environments, performance degradation often exhibits subgroup-specific heterogeneity—yet existing methods either explain only mean-level distributional shifts or isolate vulnerable subgroups without jointly identifying *where* degradation occurs and *why* it arises. This paper introduces SHIFT, the first hierarchical inference framework that unifies subgroup scanning, hierarchical causal inference, variable subset sensitivity analysis, and interpretable shift attribution. SHIFT simultaneously enables precise identification of degraded subgroups and disentanglement of underlying causes—distinguishing covariate shift from outcome shift. Evaluated on real-world deployments, SHIFT generates human-interpretable attributions of performance degradation and guides targeted interventions: it significantly improves performance for affected subgroups while avoiding negative transfer to others.

Technology Category

Application Category

📝 Abstract

Machine learning (ML) models frequently experience performance degradation when deployed in new contexts. Such degradation is rarely uniform: some subgroups may suffer large performance decay while others may not. Understanding where and how large differences in performance arise is critical for designing targeted corrective actions that mitigate decay for the most affected subgroups while minimizing any unintended effects. Current approaches do not provide such detailed insight, as they either (i) explain how average performance shifts arise or (ii) identify adversely affected subgroups without insight into how this occurred. To this end, we introduce a Subgroup-scanning Hierarchical Inference Framework for performance drifT (SHIFT). SHIFT first asks"Is there any subgroup with unacceptably large performance decay due to covariate/outcome shifts?"(Where?) and, if so, dives deeper to ask"Can we explain this using more detailed variable(subset)-specific shifts?"(How?). In real-world experiments, we find that SHIFT identifies interpretable subgroups affected by performance decay, and suggests targeted actions that effectively mitigate the decay.

Problem

Research questions and friction points this paper is trying to address.

Identifies subgroups with significant performance decay

Explains causes of decay via variable-specific shifts

Proposes targeted actions to mitigate performance degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical framework for performance drift diagnosis

Subgroup-scanning to identify affected subgroups

Variable-specific shift analysis for targeted mitigation

🔎 Similar Papers

Unveiling Group-Specific Distributed Concept Drift: A Fairness Imperative in Federated Learning