Bridge: Basis-Driven Causal Inference Marries VFMs for Domain Generalization

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

183K/year
🤖 AI Summary
This work addresses the degradation of cross-domain generalization in object detection under single-source, data-scarce settings, where spurious correlations induced by confounding factors—such as illumination, co-occurrence, or stylistic variations—adversely affect model performance. To mitigate this, we propose Bridge, the first basis-driven causal inference framework for domain generalization. Bridge leverages low-rank basis learning to implement front-door adjustment, effectively blocking confounding pathways while filtering out task-irrelevant and redundant components to refine representations. The framework seamlessly integrates with both discriminative and generative vision foundation models—including DINOv2/3, SAM, and Stable Diffusion—and achieves state-of-the-art performance across multiple domain generalization benchmarks, including Cross-Camera, Adverse Weather, Real-to-Artistic, Diverse Weather, and the newly introduced DroneVehicle dataset.
📝 Abstract
Detectors often suffer from degraded performance, primarily due to the distributional gap between the source and target domains. This issue is especially evident in single-source domains with limited data, as models tend to rely on confounders (e.g., illumination, co-occurrence, and style) from the source domain, leading to spurious correlations that hinder generalization. To this end, this paper proposes a novel Basis-driven framework for domain generalization, namely \textbf{\textit{Bridge}}, that incorporates causal inference into object detection. By learning the low-rank bases for front-door adjustment, \textbf{\textit{Bridge}} blocks confounders' effects to mitigate spurious correlations, while simultaneously refining representations by filtering redundant and task-irrelevant components. \textbf{\textit{Bridge}} can be seamlessly integrated with both discriminative (e.g., DINOv2/3, SAM) and generative (e.g., Stable Diffusion) Vision Foundation Models (VFMs). Extensive experiments across multiple domain generalization object detection datasets, i.e., Cross-Camera, Adverse Weather, Real-to-Artistic, Diverse Weather Datasets, and Diverse Weather DroneVehicle (our newly augmented real-world UAV-based benchmark), underscore the superiority of our proposed method over previous state-of-the-art approaches. The project page is available at: https://mingbohong.github.io/Bridge/.
Problem

Research questions and friction points this paper is trying to address.

domain generalization
object detection
spurious correlations
distributional gap
confounders
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Inference
Domain Generalization
Vision Foundation Models
Front-Door Adjustment
Low-Rank Bases
🔎 Similar Papers
2024-06-18Conference on Empirical Methods in Natural Language ProcessingCitations: 2