RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Open-vocabulary detectors exhibit poor generalization on out-of-distribution datasets, while existing vision-language model (VLM) fine-tuning methods incur prohibitive computational overhead. To address this, we propose RF-DETR: a lightweight detection Transformer based on weight-sharing neural architecture search (NAS). RF-DETR is the first to incorporate tunable components into the DETR architecture search space, jointly optimizing encoder-decoder topology and multi-scale feature fusion strategies. Crucially, it enables efficient evaluation of thousands of architectures without retraining, facilitating rapid exploration of the accuracy–latency Pareto frontier. Experiments demonstrate substantial cross-domain transfer gains: RF-DETR (nano) achieves 48.0 AP on COCO (+5.3 over D-FINE); RF-DETR (2×-large) surpasses 60 AP on COCO—the first such result—and supports real-time inference. On Roboflow100-VL, it outperforms GroundingDINO by 1.2 AP while being 20× faster.

Technology Category

Application Category

📝 Abstract
Open-vocabulary detectors achieve impressive performance on COCO, but often fail to generalize to real-world datasets with out-of-distribution classes not typically found in their pre-training. Rather than simply fine-tuning a heavy-weight vision-language model (VLM) for new domains, we introduce RF-DETR, a light-weight specialist detection transformer that discovers accuracy-latency Pareto curves for any target dataset with weight-sharing neural architecture search (NAS). Our approach fine-tunes a pre-trained base network on a target dataset and evaluates thousands of network configurations with different accuracy-latency tradeoffs without re-training. Further, we revisit the"tunable knobs"for NAS to improve the transferability of DETRs to diverse target domains. Notably, RF-DETR significantly improves on prior state-of-the-art real-time methods on COCO and Roboflow100-VL. RF-DETR (nano) achieves 48.0 AP on COCO, beating D-FINE (nano) by 5.3 AP at similar latency, and RF-DETR (2x-large) outperforms GroundingDINO (tiny) by 1.2 AP on Roboflow100-VL while running 20x as fast. To the best of our knowledge, RF-DETR (2x-large) is the first real-time detector to surpass 60 AP on COCO. Our code is at https://github.com/roboflow/rf-detr
Problem

Research questions and friction points this paper is trying to address.

Open-vocabulary detectors fail to generalize to out-of-distribution datasets
Existing methods lack efficient accuracy-latency tradeoff optimization
Current detectors struggle with real-time performance and domain transferability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight specialist transformer using neural architecture search
Evaluates configurations without retraining via weight-sharing NAS
Optimizes DETR transferability by revisiting tunable NAS knobs
🔎 Similar Papers
No similar papers found.