SparseCoop: Cooperative Perception with Kinematic-Grounded Queries

📅 2025-12-07

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

To address key challenges in cooperative perception—including high communication overhead, inflexible cross-vehicle asynchronous and multi-view spatiotemporal alignment, weak geometric modeling of sparse queries, suboptimal fusion strategies, and training instability—this paper proposes the first fully sparse, BEV-free cooperative perception framework. Our method introduces three core innovations: (1) kinematics-aware instance-level anchor queries for explicit and interpretable spatiotemporal alignment; (2) a coarse-to-fine sparse aggregation module to enhance robustness in cross-vehicle feature fusion; and (3) a cooperative instance denoising training paradigm that improves convergence stability and delay resilience. Evaluated on V2X-Seq and Griffin, our approach achieves state-of-the-art performance in 3D detection and tracking while drastically reducing communication load—only sparse queries are transmitted. It simultaneously delivers high accuracy, low computational cost, and strong temporal robustness.

Technology Category

Application Category

📝 Abstract

Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approaches sharing dense Bird's-Eye-View (BEV) features are constrained by quadratically-scaling communication costs and the lack of flexibility and interpretability for precise alignment across asynchronous or disparate viewpoints. While emerging sparse query-based methods offer an alternative, they often suffer from inadequate geometric representations, suboptimal fusion strategies, and training instability. In this paper, we propose SparseCoop, a fully sparse cooperative perception framework for 3D detection and tracking that completely discards intermediate BEV representations. Our framework features a trio of innovations: a kinematic-grounded instance query that uses an explicit state vector with 3D geometry and velocity for precise spatio-temporal alignment; a coarse-to-fine aggregation module for robust fusion; and a cooperative instance denoising task to accelerate and stabilize training. Experiments on V2X-Seq and Griffin datasets show SparseCoop achieves state-of-the-art performance. Notably, it delivers this with superior computational efficiency, low transmission cost, and strong robustness to communication latency. Code is available at https://github.com/wang-jh18-SVM/SparseCoop.

Problem

Research questions and friction points this paper is trying to address.

Overcomes single vehicle limitations like occlusions and constrained fields-of-view

Addresses high communication costs and alignment issues in cooperative perception

Improves geometric representation and training stability in sparse query methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Kinematic-grounded instance query for precise spatio-temporal alignment

Coarse-to-fine aggregation module for robust fusion

Cooperative instance denoising task to accelerate training

🔎 Similar Papers

No similar papers found.