VERIA: Verification-Centric Multimodal Instance Augmentation for Long-Tailed 3D Object Detection

📅 2026-03-25

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the performance bottleneck in 3D object detection for rare classes under long-tailed distributions in autonomous driving, where sparse samples and insufficient intra-class diversity hinder model generalization. To tackle this challenge, we propose a verification-driven multi-modal instance augmentation framework that leverages off-the-shelf foundation models to synthesize synchronized RGB-LiDAR instances. A dual verification mechanism—enforcing both semantic consistency and geometric plausibility—is employed to filter high-fidelity synthetic instances, thereby enhancing training data diversity while preserving contextual realism. Our approach innovatively incorporates a staged pipeline decomposition and logging-based monitoring to ensure process reliability and maintain the statistical characteristics of LiDAR data. Experiments on the nuScenes and Lyft datasets demonstrate significant improvements in 3D detection performance for rare classes under both LiDAR-only and multi-modal settings.

Technology Category

Application Category

📝 Abstract

Long-tail distributions in driving datasets pose a fundamental challenge for 3D perception, as rare classes exhibit substantial intra-class diversity yet available samples cover this variation space only sparsely. Existing instance augmentation methods based on copy-paste or asset libraries improve rare-class exposure but are often limited in fine-grained diversity and scene-context placement. We propose VERIA, an image-first multimodal augmentation framework that synthesizes synchronized RGB--LiDAR instances using off-the-shelf foundation models and curates them with sequential semantic and geometric verification. This verification-centric design tends to select instances that better match real LiDAR statistics while spanning a wider range of intra-class variation. Stage-wise yield decomposition provides a log-based diagnostic of pipeline reliability. On nuScenes and Lyft, VERIA improves rare-class 3D object detection in both LiDAR-only and multimodal settings. Our code is available at https://sgvr.kaist.ac.kr/VERIA/.

Problem

Research questions and friction points this paper is trying to address.

long-tailed distribution

3D object detection

rare-class augmentation

multimodal perception

instance diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal augmentation

verification-centric

long-tailed 3D detection