🤖 AI Summary
This work addresses the degradation of cross-domain generalization in object detection under single-source, data-scarce settings, where spurious correlations induced by confounding factors—such as illumination, co-occurrence, or stylistic variations—adversely affect model performance. To mitigate this, we propose Bridge, the first basis-driven causal inference framework for domain generalization. Bridge leverages low-rank basis learning to implement front-door adjustment, effectively blocking confounding pathways while filtering out task-irrelevant and redundant components to refine representations. The framework seamlessly integrates with both discriminative and generative vision foundation models—including DINOv2/3, SAM, and Stable Diffusion—and achieves state-of-the-art performance across multiple domain generalization benchmarks, including Cross-Camera, Adverse Weather, Real-to-Artistic, Diverse Weather, and the newly introduced DroneVehicle dataset.
📝 Abstract
Detectors often suffer from degraded performance, primarily due to the distributional gap between the source and target domains. This issue is especially evident in single-source domains with limited data, as models tend to rely on confounders (e.g., illumination, co-occurrence, and style) from the source domain, leading to spurious correlations that hinder generalization. To this end, this paper proposes a novel Basis-driven framework for domain generalization, namely \textbf{\textit{Bridge}}, that incorporates causal inference into object detection. By learning the low-rank bases for front-door adjustment, \textbf{\textit{Bridge}} blocks confounders' effects to mitigate spurious correlations, while simultaneously refining representations by filtering redundant and task-irrelevant components. \textbf{\textit{Bridge}} can be seamlessly integrated with both discriminative (e.g., DINOv2/3, SAM) and generative (e.g., Stable Diffusion) Vision Foundation Models (VFMs). Extensive experiments across multiple domain generalization object detection datasets, i.e., Cross-Camera, Adverse Weather, Real-to-Artistic, Diverse Weather Datasets, and Diverse Weather DroneVehicle (our newly augmented real-world UAV-based benchmark), underscore the superiority of our proposed method over previous state-of-the-art approaches. The project page is available at: https://mingbohong.github.io/Bridge/.