Superpowering Open-Vocabulary Object Detectors for X-ray Vision

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address data scarcity and the modality gap between RGB and X-ray imagery in open-vocabulary object detection (OvOD) for X-ray security screening, this paper proposes RAXO—a fine-tuning-free framework. Its core innovations are: (1) a novel X-ray material transfer mechanism that stylizes RGB images to emulate X-ray appearance; and (2) a dual-source retrieval strategy that jointly leverages RGB image retrieval and material-transferred results to generate high-fidelity, X-ray–like visual descriptors—replacing conventional text-based classifiers to bridge cross-modal semantic discrepancies. Evaluated on the newly introduced DET-COMPASS benchmark—featuring 300+ categories and large-scale bounding-box annotations—RAXO achieves an average mAP improvement of 17.0 points over state-of-the-art baselines. Both code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Open-vocabulary object detection (OvOD) is set to revolutionize security screening by enabling systems to recognize any item in X-ray scans. However, developing effective OvOD models for X-ray imaging presents unique challenges due to data scarcity and the modality gap that prevents direct adoption of RGB-based solutions. To overcome these limitations, we propose RAXO, a training-free framework that repurposes off-the-shelf RGB OvOD detectors for robust X-ray detection. RAXO builds high-quality X-ray class descriptors using a dual-source retrieval strategy. It gathers relevant RGB images from the web and enriches them via a novel X-ray material transfer mechanism, eliminating the need for labeled databases. These visual descriptors replace text-based classification in OvOD, leveraging intra-modal feature distances for robust detection. Extensive experiments demonstrate that RAXO consistently improves OvOD performance, providing an average mAP increase of up to 17.0 points over base detectors. To further support research in this emerging field, we also introduce DET-COMPASS, a new benchmark featuring bounding box annotations for over 300 object categories, enabling large-scale evaluation of OvOD in X-ray. Code and dataset available at: https://github.com/PAGF188/RAXO.
Problem

Research questions and friction points this paper is trying to address.

Overcoming data scarcity in X-ray object detection
Bridging modality gap between RGB and X-ray images
Enhancing open-vocabulary detection without labeled X-ray data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free RGB OvOD adaptation for X-ray
Dual-source retrieval for X-ray class descriptors
X-ray material transfer enriches RGB images
🔎 Similar Papers
No similar papers found.
P
Pablo Garcia-Fernandez
University of Santiago de Compostela, Spain
Lorenzo Vaquero
Lorenzo Vaquero
Researcher at Fondazione Bruno Kessler (FBK)
visual object trackingdeep learningcomputer vision
M
Mingxuan Liu
University of Trento, Italy
F
Feng Xue
University of Trento, Italy
Daniel Cores
Daniel Cores
Assistant Professor, University of Santiago de Compostela
Computer VisionDeep Learning
N
N. Sebe
University of Trento, Italy
M
M. Mucientes
University of Santiago de Compostela, Spain
Elisa Ricci
Elisa Ricci
University of Trento & Fondazione Bruno Kessler
Computer VisionDeep LearningRobotics