Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work challenges the prevailing assumption in out-of-distribution (OOD) generalization that dataset bias must be uniformly eliminated. Instead, it proposes a paradigm of *selectively preserving and leveraging bias*. Methodologically, it introduces a dual-path framework: one path learns environment-invariant representations, while the other explicitly models and exploits bias features via environment-conditioned estimation and bias-aware prediction—jointly enhancing generalization. Theoretically, it establishes the first necessary and sufficient conditions under which bias can be safely utilized for OOD generalization, and designs a bias-feature extraction and adaptive selection mechanism. Extensive experiments on synthetic benchmarks and standard domain generalization datasets (e.g., PACS, Office-Home) demonstrate significant improvements over state-of-the-art methods, validating robustness and generalization gains. The core contribution lies in overturning the “bias-must-be-removed” consensus and reframing bias as a structured, informative prior for OOD generalization.

Technology Category

Application Category

📝 Abstract
Most existing methods for adapting models to out-of-distribution (OOD) domains rely on invariant representation learning to eliminate the influence of biased features. However, should bias always be eliminated -- and if not, when should it be retained, and how can it be leveraged? To address these questions, we first present a theoretical analysis that explores the conditions under which biased features can be identified and effectively utilized. Building on this theoretical foundation, we introduce a novel framework that strategically leverages bias to complement invariant representations during inference. The framework comprises two key components that leverage bias in both direct and indirect ways: (1) using invariance as guidance to extract predictive ingredients from bias, and (2) exploiting identified bias to estimate the environmental condition and then use it to explore appropriate bias-aware predictors to alleviate environment gaps. We validate our approach through experiments on both synthetic datasets and standard domain generalization benchmarks. Results consistently demonstrate that our method outperforms existing approaches, underscoring its robustness and adaptability.
Problem

Research questions and friction points this paper is trying to address.

Exploring conditions to identify and utilize biased features effectively
Introducing a framework leveraging bias to complement invariant representations
Validating approach via synthetic datasets and domain generalization benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages bias to complement invariant representations
Uses invariance as guidance to extract predictive bias
Exploits bias to estimate and bridge environment gaps
🔎 Similar Papers
2023-09-03ACM Transactions on Software Engineering and MethodologyCitations: 17
2024-05-20arXiv.orgCitations: 0
Y
Yan Li
Mohamed bin Zayed University of Artificial Intelligence
G
Guangyi Chen
Carnegie Mellon University
Y
Yunlong Deng
Mohamed bin Zayed University of Artificial Intelligence
Z
Zijian Li
Carnegie Mellon University
Zeyu Tang
Zeyu Tang
Postdoctoral Scholar, Stanford University
Trustworthy AICausalityComputational Justice
Anpeng Wu
Anpeng Wu
Zhejiang University
ML: Causal LearningRepresentation LearningExplainable AI
K
Kun Zhang
Carnegie Mellon University