Handling Out-of-Distribution Data: A Survey

📅 2025-07-25

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This survey systematically addresses distribution shift between training and deployment in machine learning, focusing on two fundamental challenges: covariate shift (changes in input feature distributions) and concept shift (changes in semantic or class-conditional label distributions). We formalize and unify shift taxonomy, integrating techniques—including distribution shift detection, uncertainty estimation, domain adaptation, anomaly identification, causal inference, and invariant representation learning—within a cohesive framework bridging statistical learning and deep learning. Our key contributions include: (i) a novel robust modeling framework designed to handle heterogeneous shift types; (ii) the first systematic taxonomy covering out-of-distribution (OOD) scenarios; and (iii) a critical analysis revealing limitations of existing methods in jointly mitigating multiple concurrent shifts and generalizing to unseen classes. We establish principled evaluation criteria and outline future research directions—particularly addressing compound shifts and semantic evolution—thereby filling a critical gap in prior surveys, which largely overlook real-world deployment complexities involving intertwined and dynamically evolving shifts.

Technology Category

Application Category

📝 Abstract

In the field of Machine Learning (ML) and data-driven applications, one of the significant challenge is the change in data distribution between the training and deployment stages, commonly known as distribution shift. This paper outlines different mechanisms for handling two main types of distribution shifts: (i) Covariate shift: where the value of features or covariates change between train and test data, and (ii) Concept/Semantic-shift: where model experiences shift in the concept learned during training due to emergence of novel classes in the test phase. We sum up our contributions in three folds. First, we formalize distribution shifts, recite on how the conventional method fails to handle them adequately and urge for a model that can simultaneously perform better in all types of distribution shifts. Second, we discuss why handling distribution shifts is important and provide an extensive review of the methods and techniques that have been developed to detect, measure, and mitigate the effects of these shifts. Third, we discuss the current state of distribution shift handling mechanisms and propose future research directions in this area. Overall, we provide a retrospective synopsis of the literature in the distribution shift, focusing on OOD data that had been overlooked in the existing surveys.

Problem

Research questions and friction points this paper is trying to address.

Addressing distribution shifts between training and deployment data

Detecting and mitigating covariate and concept shifts in ML models

Reviewing methods and proposing future directions for OOD data handling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Formalizing distribution shifts and model requirements

Reviewing detection and mitigation techniques

Proposing future research directions

🔎 Similar Papers

Introducing 'Inside' Out of Distribution