Should Bias Always be Eliminated? A Principled Framework to Use Data Bias for OOD Generation

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work challenges the prevailing assumption in out-of-distribution (OOD) generalization that dataset bias must be uniformly eliminated. Instead, it proposes a paradigm of *selectively preserving and leveraging bias*. Methodologically, it introduces a dual-path framework: one path learns environment-invariant representations, while the other explicitly models and exploits bias features via environment-conditioned estimation and bias-aware prediction—jointly enhancing generalization. Theoretically, it establishes the first necessary and sufficient conditions under which bias can be safely utilized for OOD generalization, and designs a bias-feature extraction and adaptive selection mechanism. Extensive experiments on synthetic benchmarks and standard domain generalization datasets (e.g., PACS, Office-Home) demonstrate significant improvements over state-of-the-art methods, validating robustness and generalization gains. The core contribution lies in overturning the “bias-must-be-removed” consensus and reframing bias as a structured, informative prior for OOD generalization.

Technology Category

Application Category

📝 Abstract

Most existing methods for adapting models to out-of-distribution (OOD) domains rely on invariant representation learning to eliminate the influence of biased features. However, should bias always be eliminated -- and if not, when should it be retained, and how can it be leveraged? To address these questions, we first present a theoretical analysis that explores the conditions under which biased features can be identified and effectively utilized. Building on this theoretical foundation, we introduce a novel framework that strategically leverages bias to complement invariant representations during inference. The framework comprises two key components that leverage bias in both direct and indirect ways: (1) using invariance as guidance to extract predictive ingredients from bias, and (2) exploiting identified bias to estimate the environmental condition and then use it to explore appropriate bias-aware predictors to alleviate environment gaps. We validate our approach through experiments on both synthetic datasets and standard domain generalization benchmarks. Results consistently demonstrate that our method outperforms existing approaches, underscoring its robustness and adaptability.

Problem

Research questions and friction points this paper is trying to address.

Exploring conditions to identify and utilize biased features effectively

Introducing a framework leveraging bias to complement invariant representations

Validating approach via synthetic datasets and domain generalization benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages bias to complement invariant representations

Uses invariance as guidance to extract predictive bias

Exploits bias to estimate and bridge environment gaps

🔎 Similar Papers

Bias Testing and Mitigation in LLM-based Code Generation