🤖 AI Summary
To address the challenge of achieving real-time multi-object detection and velocity estimation for a 1:10-scale autonomous racing platform on resource-constrained embedded CPUs, this paper proposes a lightweight center-point-based detection framework. We introduce a novel hardware-adaptive architecture that synergistically integrates an embedded CPU with an external TPU: the CPU handles lightweight preprocessing, while the TPU executes the main backbone inference. Leveraging CenterPoint, we apply model pruning, quantization, and hardware-software co-optimization. The optimized system achieves a single-frame inference latency of only 7.88 ms while maintaining detection accuracy; CPU utilization is reduced by 8.3× compared to the baseline. Relative to state-of-the-art methods, our approach improves overall performance by 61.38%, enabling simultaneous detection of multiple opponents and high-precision velocity estimation. To the best of our knowledge, this is the first work to achieve real-time, robust perception on a purely CPU+TPU heterogeneous embedded platform.
📝 Abstract
Perception within autonomous driving is nearly synonymous with Neural Networks (NNs). Yet, the domain of autonomous racing is often characterized by scaled, computationally limited robots used for cost-effectiveness and safety. For this reason, opponent detection and tracking systems typically resort to traditional computer vision techniques due to computational constraints. This paper introduces TinyCenterSpeed, a streamlined adaptation of the seminal CenterPoint method, optimized for real-time performance on 1:10 scale autonomous racing platforms. This adaptation is viable even on OBCs powered solely by Central Processing Units (CPUs), as it incorporates the use of an external Tensor Processing Unit (TPU). We demonstrate that, compared to Adaptive Breakpoint Detector (ABD), the current State-of-the-Art (SotA) in scaled autonomous racing, TinyCenterSpeed not only improves detection and velocity estimation by up to 61.38% but also supports multi-opponent detection and estimation. It achieves real-time performance with an inference time of just 7.88 ms on the TPU, significantly reducing CPU utilization 8.3-fold.