🤖 AI Summary
This work addresses the vulnerability of open-vocabulary object detection to spurious correlations between non-causal visual attributes—such as brightness and texture—and object categories under distribution shifts. To mitigate this issue, the paper introduces the first training-free test-time adaptation framework for this task, incorporating explicit counterfactual reasoning. Specifically, it generates counterfactual views of test images by perturbing non-causal attributes and compares region-level predictions between original and counterfactual views to quantify attribute sensitivity. Based on this sensitivity, the method selectively suppresses unreliable predictions without requiring online optimization, enabling attribute-specific correction. Experiments demonstrate that the proposed approach significantly outperforms existing test-time adaptation methods on PASCAL-C, COCO-C, and FoggyCityscapes, substantially improving model robustness under distribution shifts.
📝 Abstract
Open-vocabulary object detection often fails under distribution shifts, as it can be misled by spurious correlations between non-causal visual attributes (e.g., brightness, texture) and object categories. Existing test-time adaptation (TTA) methods either depend on costly online optimization or perform global calibration, overlooking the attribute-specific nature of these failures. To address this, we propose FACTOR (counterFACtual training-free Test-time adaptation for Open-vocabulaRy object detection), a lightweight framework grounded in counterfactual reasoning. By perturbing test images along non-causal attributes and comparing region-level predictions between original and counterfactual views, FACTOR quantifies attribute sensitivity, semantic relevance, and prediction variation to selectively suppress attribute-dependent predictions-without parameter updates. Experiments on PASCAL-C, COCO-C, and FoggyCityscapes show that FACTOR consistently outperforms prior TTA methods, demonstrating that explicit counterfactual reasoning effectively improves robustness under distribution shifts.