🤖 AI Summary
X-ray prohibited-item detection faces core challenges including severe occlusion, device heterogeneity, scarce annotated data, and inconsistent evaluation protocols. To address these, this work introduces the first systematic benchmarking framework covering six public X-ray datasets and ten mainstream detection architectures—including CNNs, Transformers, and hybrid models—evaluated using multi-dimensional metrics: mAP₅₀, mAP₅₀:₉₅, inference latency, parameter count, and GFLOPS. Our analysis reveals critical generalization bottlenecks in real-world security screening scenarios and uncovers fundamental compute-accuracy trade-offs. Key findings include: lightweight CNNs demonstrate superior practicality on resource-constrained devices, while Transformers exhibit only marginal gains under heavy occlusion; cross-device generalization remains a persistent challenge. All code, pretrained weights, and evaluation results are publicly released, establishing a reproducible benchmark and actionable guidance for the community.
📝 Abstract
Automated X-ray inspection is crucial for efficient and unobtrusive security screening in various public settings. However, challenges such as object occlusion, variations in the physical properties of items, diversity in X-ray scanning devices, and limited training data hinder accurate and reliable detection of illicit items. Despite the large body of research in the field, reported experimental evaluations are often incomplete, with frequently conflicting outcomes. To shed light on the research landscape and facilitate further research, a systematic, detailed, and thorough comparative evaluation of recent Deep Learning (DL)-based methods for X-ray object detection is conducted. For this, a comprehensive evaluation framework is developed, composed of: a) Six recent, large-scale, and widely used public datasets for X-ray illicit item detection (OPIXray, CLCXray, SIXray, EDS, HiXray, and PIDray), b) Ten different state-of-the-art object detection schemes covering all main categories in the literature, including generic Convolutional Neural Network (CNN), custom CNN, generic transformer, and hybrid CNN-transformer architectures, and c) Various detection (mAP50 and mAP50:95) and time/computational-complexity (inference time (ms), parameter size (M), and computational load (GFLOPS)) metrics. A thorough analysis of the results leads to critical observations and insights, emphasizing key aspects such as: a) Overall behavior of the object detection schemes, b) Object-level detection performance, c) Dataset-specific observations, and d) Time efficiency and computational complexity analysis. To support reproducibility of the reported experimental results, the evaluation code and model weights are made publicly available at https://github.com/jgenc/xray-comparative-evaluation.