🤖 AI Summary
To address inaccurate pose sharing and inconsistent feature alignment in multi-agent collaborative perception under GNSS-denied environments, this paper proposes a robust vehicle-to-vehicle (V2V) cooperative perception framework. Methodologically, it eliminates GNSS dependency by leveraging LiDAR-based localization to establish globally consistent poses; introduces a lightweight confidence-aware Pose Generation and Calibration (PGC) module and a Pose-Aware Spatio-Temporal Alignment Transformer (PASTAT) for precise spatial alignment and temporal context modeling; and incorporates compact pose representation learning with uncertainty estimation. Furthermore, we introduce V2VLoc—the first multi-task simulation dataset supporting V2V cooperative localization and perception. Extensive experiments on V2VLoc and the real-world V2V4Real dataset demonstrate significant performance gains over state-of-the-art methods, validating the framework’s effectiveness, robustness, and cross-scenario generalizability.
📝 Abstract
Multi-agents rely on accurate poses to share and align observations, enabling a collaborative perception of the environment. However, traditional GNSS-based localization often fails in GNSS-denied environments, making consistent feature alignment difficult in collaboration. To tackle this challenge, we propose a robust GNSS-free collaborative perception framework based on LiDAR localization. Specifically, we propose a lightweight Pose Generator with Confidence (PGC) to estimate compact pose and confidence representations. To alleviate the effects of localization errors, we further develop the Pose-Aware Spatio-Temporal Alignment Transformer (PASTAT), which performs confidence-aware spatial alignment while capturing essential temporal context. Additionally, we present a new simulation dataset, V2VLoc, which can be adapted for both LiDAR localization and collaborative detection tasks. V2VLoc comprises three subsets: Town1Loc, Town4Loc, and V2VDet. Town1Loc and Town4Loc offer multi-traversal sequences for training in localization tasks, whereas V2VDet is specifically intended for the collaborative detection task. Extensive experiments conducted on the V2VLoc dataset demonstrate that our approach achieves state-of-the-art performance under GNSS-denied conditions. We further conduct extended experiments on the real-world V2V4Real dataset to validate the effectiveness and generalizability of PASTAT.