A Sparsity-Aware Autonomous Path Planning Accelerator with HW/SW Co-Design and Multi-Level Dataflow Optimization

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

To address the computational intensity and poor real-time performance of path planning on resource-constrained autonomous driving platforms, this paper proposes an end-to-end FPGA acceleration framework tailored for quadratic programming (QP) solving. The method introduces a sparsity-aware memory architecture and a dedicated sparse matrix multiplication unit, synergistically integrating the alternating direction method of multipliers (ADMM) with a preconditioned conjugate gradient (PCG) solver. A hardware-software co-designed, multi-level pipelined parallel architecture is realized through dataflow restructuring and co-optimization of hardware-friendly iterative solvers. Evaluated on the AMD ZCU102 platform, the framework achieves state-of-the-art performance: 48% latency reduction, 105% throughput improvement, and significantly higher energy efficiency compared to CPU- and GPU-based implementations. This work establishes an efficient and practical hardware acceleration paradigm for real-time path planning at the edge.

Technology Category

Application Category

📝 Abstract

Path planning is critical for autonomous driving, generating smooth, collision-free, feasible paths based on perception and localization inputs. However, its computationally intensive nature poses significant challenges for resource-constrained autonomous driving hardware. This paper presents an end-to-end FPGA-based acceleration framework targeting the quadratic programming (QP), core of optimization-based path planning. We employ a hardware-friendly alternating direction method of multipliers (ADMM) for QP solving and a parallelizable preconditioned conjugate gradient (PCG) method for linear systems. By analyzing sparse matrix patterns, we propose customized storage schemes and efficient sparse matrix multiplication units, significantly reducing resource usage and accelerating matrix operations. Our multi-level dataflow optimization strategy incorporates intra-operator parallelization and pipelining, inter-operator fine-grained pipelining, and CPU-FPGA system-level task mapping. Implemented on the AMD ZCU102 platform, our framework achieves state-of-the-art latency and energy efficiency, including 1.48x faster performance than the best FPGA-based design, 2.89x over an Intel i7-11800H CPU, 5.62x over an ARM Cortex-A57 embedded CPU, and 1.56x over a state-of-the-art GPU solution, along with a 2.05x throughput improvement over existing FPGA-based designs.

Problem

Research questions and friction points this paper is trying to address.

Accelerates path planning for autonomous driving hardware

Optimizes sparse matrix operations for resource efficiency

Improves latency and energy efficiency in FPGA-based systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

HW/SW co-design for path planning acceleration

Sparse matrix optimization for resource efficiency

Multi-level dataflow with parallel and pipelined execution

🔎 Similar Papers

Dynamic Co-Optimization Compiler: Leveraging Multi-Agent Reinforcement Learning for Enhanced DNN Accelerator Performance