AFFormer: Adaptive Feature Fusion Transformer for V2X Cooperative Perception under Channel Impairments

📅 2026-05-03

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the insufficient robustness of cooperative perception in vehicle-infrastructure systems under communication channel impairments such as noise, fading, and interference. To this end, the authors propose a Transformer-based adaptive feature fusion framework that models temporal correlations through multi-agent temporal aggregation, captures inter-agent and spatial dependencies via a dual-path spatial attention mechanism, and compensates for degraded information using an uncertainty-guided feature fusion strategy. Additionally, a teacher–student knowledge distillation scheme is employed to further enhance performance. Experimental results on the V2XSet and DAIR-V2X datasets demonstrate that the proposed method consistently outperforms existing approaches under both ideal and impaired communication conditions, achieving a superior balance between accuracy and robustness while maintaining computational efficiency.

📝 Abstract

Accurate 3D object detection is essential for ensuring the safety of autonomous vehicles. Cooperative perception, which leverages vehicle-to-everything (V2X) communication to share perceptual data, enhances detection but is vulnerable to channel impairments, such as noise, fading, and interference. To strengthen the reliability of intelligent transportation systems, this work improves the robustness of V2X cooperative perception under communication conditions that reflect common channel impairments. This paper proposes an Adaptive Feature Fusion Transformer (AFFormer), a Transformer-based framework that mitigates the adverse effects of corrupted features by modeling temporal, inter-agent, and spatial correlations. AFFormer introduces three key modules: Multi-Agent and Temporal Aggregation for context-aware fusion across agents and over time, Dual Spatial Attention for efficient modeling of spatial dependencies, and Uncertainty-Guided Fusion for entropy-driven refinement of fused features. A teacher-student knowledge distillation strategy further enhances robustness by aligning fused features with reliable early-collaboration supervision. AFFormer is validated on the V2XSet and DAIR-V2X datasets, where it consistently outperforms existing methods under both ideal and impaired communication conditions, demonstrating improved robustness to communication-induced feature degradation while maintaining a competitive efficiency-accuracy trade-off.

Problem

Research questions and friction points this paper is trying to address.

V2X cooperative perception

channel impairments

3D object detection

feature fusion

robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Feature Fusion

Transformer

V2X Cooperative Perception