One Model to Translate Them All: Universal Any-to-Any Translation for Heterogeneous Collaborative Perception

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
This work addresses the challenge of feature fusion in heterogeneous multi-agent collaborative perception, where modality discrepancies hinder effective integration. To this end, the authors propose UniTrans—the first universal framework enabling zero-shot feature translation across arbitrary modalities. UniTrans leverages a pre-trained expert parameter bank and an intrinsic modality encoder to extract scene-agnostic, modality-specific representations, modeling cross-modal mappings within a shared intrinsic latent space. A dynamic blending mechanism adaptively synthesizes translation coefficients for any source–target modality pair at inference time, allowing immediate adaptation without fine-tuning or data sharing. Extensive experiments demonstrate that UniTrans significantly outperforms existing methods on both OPV2V-H and DAIR-V2X benchmarks, achieving efficient and accurate cross-modal feature translation in both simulated and real-world scenarios.
📝 Abstract
By sharing intermediate features, collaborative perception extends each agent's sensing beyond standalone limits, but real-world feature modality heterogeneity remains a key barrier to effective fusion. Most existing methods, including direct adaption and protocol-based transformation, typically rely on training adapters for newly emerging feature modalities and often require additional retraining or fine-tuning. Such repeated training is costly and is often infeasible across manufacturers due to model and data privacy constraints, limiting real-world scalability. To address this issue, we propose UniTrans, a universal any-to-any feature modality translation model that instantiates translators on the fly for arbitrary modalities. UniTrans pretrains a bank of translator expert parameters and learns their combination coefficients as a function of source-to-target modality mapping. The mapping is measured in a modality-intrinsic latent space, where an intrinsic encoder extracts modality-specific yet scene-invariant codes from single-frame intermediate features, enabling UniTrans to instantiate translators in a zero-shot manner. Experiments on OPV2V-H and DAIR-V2X demonstrate that UniTrans consistently outperforms state-of-the-art methods in both simulated and real-world settings, enabling efficient any-to-any translation through a universal model. The code is available at https://github.com/CheeryLeeyy/UniTrans.
Problem

Research questions and friction points this paper is trying to address.

collaborative perception
feature modality heterogeneity
any-to-any translation
zero-shot adaptation
universal model
Innovation

Methods, ideas, or system contributions that make the work stand out.

universal translation
heterogeneous collaborative perception
zero-shot modality translation
modality-intrinsic latent space
any-to-any feature fusion
🔎 Similar Papers
No similar papers found.
Yang Li
Yang Li
Beijing University of Posts and Telecommunications, Beijing 100876, China
mobile edge computingcomputing offloadingresource allocationuser collaborationLLM
W
Weize Li
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
Q
Quan Yuan
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
C
Congzhang Shao
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
G
Guiyang Luo
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
Y
Yunqi Ba
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
X
Xuanhan Zhu
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
X
Xinyuan Ding
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
X
Xiaoyuan Fu
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
J
Jinglin Li
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China