3DRot: 3D Rotation Augmentation for RGB-Based 3D Tasks

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RGB-based 3D vision tasks—such as 3D object detection and depth estimation—are hindered by two key challenges: the scarcity of densely annotated 3D data and the geometric inconsistency introduced by conventional image augmentations (e.g., scaling, rotation) that disregard camera projection constraints. To address this, we propose a depth-supervision-free geometrically consistent augmentation framework operating entirely in camera space. Our method jointly transforms RGB images, intrinsic camera parameters, object 6D poses, and 3D annotations—ensuring strict adherence to projective geometry during rotation and mirroring operations. The approach is plug-and-play and generalizes seamlessly across diverse 3D perception tasks. Evaluated on SUN RGB-D, it improves 3D detection IoU₃D by 1.30 points (43.21 → 44.51), mAP₀.₅ by 2.41 points (35.70 → 38.11), and reduces pose estimation rotation error to 20.93°, substantially outperforming existing augmentation strategies.

Technology Category

Application Category

📝 Abstract
RGB-based 3D tasks, e.g., 3D detection, depth estimation, 3D keypoint estimation, still suffer from scarce, expensive annotations and a thin augmentation toolbox, since most image transforms, including resize and rotation, disrupt geometric consistency. In this paper, we introduce 3DRot, a plug-and-play augmentation that rotates and mirrors images about the camera's optical center while synchronously updating RGB images, camera intrinsics, object poses, and 3D annotations to preserve projective geometry-achieving geometry-consistent rotations and reflections without relying on any scene depth. We validate 3DRot with a classical 3D task, monocular 3D detection. On SUN RGB-D dataset, 3DRot raises $IoU_{3D}$ from 43.21 to 44.51, cuts rotation error (ROT) from 22.91$^circ$ to 20.93$^circ$, and boosts $mAP_{0.5}$ from 35.70 to 38.11. As a comparison, Cube R-CNN adds 3 other datasets together with SUN RGB-D for monocular 3D estimation, with a similar mechanism and test dataset, increases $IoU_{3D}$ from 36.2 to 37.8, boosts $mAP_{0.5}$ from 34.7 to 35.4. Because it operates purely through camera-space transforms, 3DRot is readily transferable to other 3D tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhances RGB-based 3D tasks with geometry-consistent augmentation
Addresses scarce and expensive 3D annotations issue
Improves monocular 3D detection performance metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

3DRot enables geometry-consistent image rotations
Synchronizes RGB, camera intrinsics, and 3D annotations
Plug-and-play augmentation without requiring scene depth
🔎 Similar Papers
No similar papers found.