GT-Space: Enhancing Heterogeneous Collaborative Perception with Ground Truth Feature Space

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of feature misalignment and ineffective fusion among heterogeneous agents—such as those with diverse sensors or model architectures—in collaborative perception for autonomous driving. To this end, the authors propose a general-purpose feature alignment framework that leverages ground-truth labels to construct a unified shared feature space (GT-Space). Each agent projects its features into this common space via a lightweight adapter, eliminating the need for pairwise interaction or retraining of encoders. A cross-modal contrastive learning–driven fusion mechanism further enhances performance while significantly improving system scalability and deployment flexibility. Extensive experiments on the OPV2V, V2XSet, and RCooper datasets demonstrate that the proposed method consistently achieves superior detection accuracy and robustness compared to state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract
In autonomous driving, multi-agent collaborative perception enhances sensing capabilities by enabling agents to share perceptual data. A key challenge lies in handling {\em heterogeneous} features from agents equipped with different sensing modalities or model architectures, which complicates data fusion. Existing approaches often require retraining encoders or designing interpreter modules for pairwise feature alignment, but these solutions are not scalable in practice. To address this, we propose {\em GT-Space}, a flexible and scalable collaborative perception framework for heterogeneous agents. GT-Space constructs a common feature space from ground-truth labels, providing a unified reference for feature alignment. With this shared space, agents only need a single adapter module to project their features, eliminating the need for pairwise interactions with other agents. Furthermore, we design a fusion network trained with contrastive losses across diverse modality combinations. Extensive experiments on simulation datasets (OPV2V and V2XSet) and a real-world dataset (RCooper) demonstrate that GT-Space consistently outperforms baselines in detection accuracy while delivering robust performance. Our code will be released at https://github.com/KingScar/GT-Space.
Problem

Research questions and friction points this paper is trying to address.

collaborative perception
heterogeneous agents
feature alignment
multi-agent perception
autonomous driving
Innovation

Methods, ideas, or system contributions that make the work stand out.

GT-Space
heterogeneous collaborative perception
ground truth feature space
feature alignment
contrastive learning
🔎 Similar Papers
No similar papers found.
W
Wentao Wang
Sun Yat-sen University, Shenzhen Campus, Shenzhen, China
H
Haoran Xu
Sun Yat-sen University, Shenzhen Campus, Shenzhen, China; Peng Cheng Laboratory (PCL), Shenzhen, China
Guang Tan
Guang Tan
School of Intelligent Systems Engineering, Sun Yat-sen Unversity
Machine LearningMobile ComputingNetworking