π€ AI Summary
This paper addresses the cost-performance trade-off in roadside perception systems arising from heterogeneous high- and low-resolution multi-modal sensors. Method: We propose a cost-effective deployment framework featuring (i) a novel integer-programming-driven joint sensor placement optimization, and (ii) a modular multi-modal fusion architecture inspired by human multisensory integration, enabling precise alignment and feature fusion between 4D mmWave radar (with velocity information) and sparse, low-resolution LiDAR point clouds. Contribution/Results: Our work systematically challenges the βhigher resolution is always betterβ paradigm, revealing the complementary roles of information dimensionality (e.g., velocity) and spatial resolution. Experiments demonstrate a 1.5% improvement in mAP for six traffic participant classes and a 14% gain in pedestrian AP. Moreover, our framework achieves comparable performance to all-high-resolution LiDAR baselines at significantly lower cost and is agnostic to backbone network architecture.
π Abstract
Balancing cost and performance is crucial when choosing high- versus low-resolution point-cloud roadside sensors. For example, LiDAR delivers dense point cloud, while 4D millimeter-wave radar, though spatially sparser, embeds velocity cues that help distinguish objects and come at a lower price. Unfortunately, the sensor placement strategies will influence point cloud density and distribution across the coverage area. Compounding the first challenge is the fact that different sensor mixtures often demand distinct neural network architectures to maximize their complementary strengths. Without an evaluation framework that establishes a benchmark for comparison, it is imprudent to make claims regarding whether marginal gains result from higher resolution and new sensing modalities or from the algorithms. We present an ex-ante evaluation that addresses the two challenges. First, we realized a simulation tool that builds on integer programming to automatically compare different sensor placement strategies against coverage and cost jointly. Additionally, inspired by human multi-sensory integration, we propose a modular framework to assess whether reductions in spatial resolution can be compensated by informational richness in detecting traffic participants. Extensive experimental testing on the proposed framework shows that fusing velocity-encoded radar with low-resolution LiDAR yields marked gains (14 percent AP for pedestrians and an overall mAP improvement of 1.5 percent across six categories) at lower cost than high-resolution LiDAR alone. Notably, these marked gains hold regardless of the specific deep neural modules employed in our frame. The result challenges the prevailing assumption that high resolution are always superior to low-resolution alternatives.