Superpowering Open-Vocabulary Object Detectors for X-ray Vision

📅 2025-03-21

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address data scarcity and the modality gap between RGB and X-ray imagery in open-vocabulary object detection (OvOD) for X-ray security screening, this paper proposes RAXO—a fine-tuning-free framework. Its core innovations are: (1) a novel X-ray material transfer mechanism that stylizes RGB images to emulate X-ray appearance; and (2) a dual-source retrieval strategy that jointly leverages RGB image retrieval and material-transferred results to generate high-fidelity, X-ray–like visual descriptors—replacing conventional text-based classifiers to bridge cross-modal semantic discrepancies. Evaluated on the newly introduced DET-COMPASS benchmark—featuring 300+ categories and large-scale bounding-box annotations—RAXO achieves an average mAP improvement of 17.0 points over state-of-the-art baselines. Both code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract

Open-vocabulary object detection (OvOD) is set to revolutionize security screening by enabling systems to recognize any item in X-ray scans. However, developing effective OvOD models for X-ray imaging presents unique challenges due to data scarcity and the modality gap that prevents direct adoption of RGB-based solutions. To overcome these limitations, we propose RAXO, a training-free framework that repurposes off-the-shelf RGB OvOD detectors for robust X-ray detection. RAXO builds high-quality X-ray class descriptors using a dual-source retrieval strategy. It gathers relevant RGB images from the web and enriches them via a novel X-ray material transfer mechanism, eliminating the need for labeled databases. These visual descriptors replace text-based classification in OvOD, leveraging intra-modal feature distances for robust detection. Extensive experiments demonstrate that RAXO consistently improves OvOD performance, providing an average mAP increase of up to 17.0 points over base detectors. To further support research in this emerging field, we also introduce DET-COMPASS, a new benchmark featuring bounding box annotations for over 300 object categories, enabling large-scale evaluation of OvOD in X-ray. Code and dataset available at: https://github.com/PAGF188/RAXO.

Problem

Research questions and friction points this paper is trying to address.

Overcoming data scarcity in X-ray object detection

Bridging modality gap between RGB and X-ray images

Enhancing open-vocabulary detection without labeled X-ray data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free RGB OvOD adaptation for X-ray

Dual-source retrieval for X-ray class descriptors

X-ray material transfer enriches RGB images

🔎 Similar Papers

A Lightweight Modular Framework for Low-Cost Open-Vocabulary Object Detection Training