DriveFlow: Rectified Flow Adaptation for Robust 3D Object Detection in Autonomous Driving

📅 2025-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the robustness of vision-based 3D detection under out-of-distribution (OOD) scenarios in autonomous driving, this paper proposes a fine-tuning-free image editing method leveraging pre-trained text-to-image diffusion models. Our approach employs frequency-domain decomposition to achieve semantically consistent and geometrically faithful data augmentation. Key contributions include: (1) a high-frequency foreground preservation strategy that maintains precise 3D object geometry; (2) dual-frequency background optimization balancing semantic plausibility and environmental diversity; and (3) text-conditioned velocity guidance and a high-frequency alignment loss to improve editing trajectory quality. Evaluated on multiple OOD benchmarks, the method consistently improves 3D detection performance across object categories, demonstrating strong generalization and computational efficiency. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
In autonomous driving, vision-centric 3D object detection recognizes and localizes 3D objects from RGB images. However, due to high annotation costs and diverse outdoor scenes, training data often fails to cover all possible test scenarios, known as the out-of-distribution (OOD) issue. Training-free image editing offers a promising solution for improving model robustness by training data enhancement without any modifications to pre-trained diffusion models. Nevertheless, inversion-based methods often suffer from limited effectiveness and inherent inaccuracies, while recent rectified-flow-based approaches struggle to preserve objects with accurate 3D geometry. In this paper, we propose DriveFlow, a Rectified Flow Adaptation method for training data enhancement in autonomous driving based on pre-trained Text-to-Image flow models. Based on frequency decomposition, DriveFlow introduces two strategies to adapt noise-free editing paths derived from text-conditioned velocities. 1) High-Frequency Foreground Preservation: DriveFlow incorporates a high-frequency alignment loss for foreground to maintain precise 3D object geometry. 2) Dual-Frequency Background Optimization: DriveFlow also conducts dual-frequency optimization for background, balancing editing flexibility and semantic consistency. Comprehensive experiments validate the effectiveness and efficiency of DriveFlow, demonstrating comprehensive performance improvements on all categories across OOD scenarios. Code is available at https://github.com/Hongbin98/DriveFlow.
Problem

Research questions and friction points this paper is trying to address.

Addresses out-of-distribution robustness in autonomous driving 3D detection
Overcomes limitations of inversion-based and rectified-flow image editing methods
Preserves accurate 3D object geometry while enhancing training data diversity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rectified Flow Adaptation for training data enhancement
High-frequency alignment loss preserves 3D object geometry
Dual-frequency optimization balances editing flexibility and consistency
🔎 Similar Papers
No similar papers found.
H
Hongbin Lin
FNii-Shenzhen, SSE, CUHK-Shenzhen
Y
Yiming Yang
FNii-Shenzhen, SSE, CUHK-Shenzhen
C
Chaoda Zheng
Xpeng Motors
Y
Yifan Zhang
MiroMind AI
Shuaicheng Niu
Shuaicheng Niu
Nanyang Technological University
Machine LearningDomain AdaptationRobustnessAutoML
Zilu Guo
Zilu Guo
FNii-Shenzhen, SSE, CUHK-Shenzhen
Y
Yafeng Li
Baoji University of Arts and Sciences
G
Gui Gui
Central South University
Shuguang Cui
Shuguang Cui
Distinguished Presidential Chair Professor, School of Science and Engineering, CUHKSZ
AI+NetworkingWireless Communications
Z
Zhen Li
SSE, CUHK-Shenzhen, FNii-Shenzhen