ALIGN: Advanced Query Initialization with LiDAR-Image Guidance for Occlusion-Robust 3D Object Detection

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

To address inefficient query initialization and degraded performance under occlusion and high-density scenarios in multimodal 3D detection, this paper proposes a LiDAR–image collaborative object-aware query initialization framework. Our method introduces three key innovations: (1) occlusion-aware center estimation, which jointly leverages point cloud clustering and BEV geometric modeling to improve localization accuracy for occluded objects; (2) cross-modal feature alignment–driven adaptive neighborhood sampling, enhancing geometric–semantic consistency across modalities; and (3) a dynamic query weighting mechanism that balances query distribution according to task relevance. Evaluated on the nuScenes benchmark, our approach consistently improves mainstream 3D detectors—achieving up to +0.9 mAP and +1.2 NDS over strong baselines—while demonstrating exceptional robustness in severely occluded and high-traffic scenes.

Technology Category

Application Category

📝 Abstract

Recent query-based 3D object detection methods using camera and LiDAR inputs have shown strong performance, but existing query initialization strategies,such as random sampling or BEV heatmap-based sampling, often result in inefficient query usage and reduced accuracy, particularly for occluded or crowded objects. To address this limitation, we propose ALIGN (Advanced query initialization with LiDAR and Image GuidaNce), a novel approach for occlusion-robust, object-aware query initialization. Our model consists of three key components: (i) Occlusion-aware Center Estimation (OCE), which integrates LiDAR geometry and image semantics to estimate object centers accurately (ii) Adaptive Neighbor Sampling (ANS), which generates object candidates from LiDAR clustering and supplements each object by sampling spatially and semantically aligned points around it and (iii) Dynamic Query Balancing (DQB), which adaptively balances queries between foreground and background regions. Our extensive experiments on the nuScenes benchmark demonstrate that ALIGN consistently improves performance across multiple state-of-the-art detectors, achieving gains of up to +0.9 mAP and +1.2 NDS, particularly in challenging scenes with occlusions or dense crowds. Our code will be publicly available upon publication.

Problem

Research questions and friction points this paper is trying to address.

Improves query initialization for 3D object detection

Addresses inefficient query usage in crowded scenes

Enhances detection accuracy for occluded objects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LiDAR geometry with image semantics for center estimation

Samples aligned points around objects using adaptive neighbor sampling

Balances queries between foreground and background regions dynamically

🔎 Similar Papers

MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection