π€ AI Summary
To address the high computational cost and insufficient cross-scenario robustness of multi-sensor object-level fusion in autonomous driving, this paper proposes HiLOβthe first Transformer-based higher-order object fusion framework. HiLO tackles these challenges through three key innovations: (1) the first integration of Transformers into object-level fusion, enabling end-to-end learnable cross-modal feature alignment; (2) coupling with an enhanced adaptive Kalman filter (AKF) to support dynamic confidence-weighted fusion; and (3) a domain-agnostic architecture designed for both urban and highway scenarios. Evaluated on a large-scale, in-house real-world dataset, HiLO achieves a 25.9-percentage-point improvement in F1 score and a 6.1-percentage-point gain in mean IoU over state-of-the-art baselines, demonstrating significantly enhanced generalization capability and computational efficiency.
π Abstract
The fusion of sensor data is essential for a robust perception of the environment in autonomous driving. Learning-based fusion approaches mainly use feature-level fusion to achieve high performance, but their complexity and hardware requirements limit their applicability in near-production vehicles. High-level fusion methods offer robustness with lower computational requirements. Traditional methods, such as the Kalman filter, dominate this area. This paper modifies the Adapted Kalman Filter (AKF) and proposes a novel transformer-based high-level object fusion method called HiLO. Experimental results demonstrate improvements of $25.9$ percentage points in $ extrm{F}_1$ score and $6.1$ percentage points in mean IoU. Evaluation on a new large-scale real-world dataset demonstrates the effectiveness of the proposed approaches. Their generalizability is further validated by cross-domain evaluation between urban and highway scenarios. Code, data, and models are available at https://github.com/rst-tu-dortmund/HiLO .