GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control

📅 2025-10-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-time Model Predictive Control (MPC) for edge deployment requires solving dozens to hundreds of nonlinear trajectory optimization problems per batch under strict latency constraints; existing GPU-accelerated approaches fail to simultaneously ensure real-time performance and high batch throughput. Method: We propose a holistic algorithm–software–hardware co-design framework: (i) a hierarchical parallelization scheme integrating block-level, warp-level, and thread-level fine-grained concurrency to support both single- and cross-iteration asynchronous batch solving; (ii) a dynamic memory optimization strategy to maximize GPU resource utilization; and (iii) a general-purpose nonlinear optimization kernel implemented in CUDA. Contribution/Results: Experiments show 18–21× speedup over CPU baselines and 1.4–16× improvement over state-of-the-art GPU methods. The approach significantly enhances convergence robustness and disturbance rejection capability, and is validated on an industrial robotic arm platform.

Technology Category

Application Category

📝 Abstract
While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing GPU-accelerated approaches typically (i) parallelize a single solve to meet real-time deadlines, (ii) scale to very large batches at slower-than-real-time rates, or (iii) achieve speed by restricting model generality (e.g., point-mass dynamics or a single linearization). This leaves a large gap in solver performance for many state-of-the-art MPC applications that require real-time batches of tens to low-hundreds of solves. As such, we present GATO, an open source, GPU-accelerated, batched TO solver co-designed across algorithm, software, and computational hardware to deliver real-time throughput for these moderate batch size regimes. Our approach leverages a combination of block-, warp-, and thread-level parallelism within and across solves for ultra-high performance. We demonstrate the effectiveness of our approach through a combination of: simulated benchmarks showing speedups of 18-21x over CPU baselines and 1.4-16x over GPU baselines as batch size increases; case studies highlighting improved disturbance rejection and convergence behavior; and finally a validation on hardware using an industrial manipulator. We open source GATO to support reproducibility and adoption.
Problem

Research questions and friction points this paper is trying to address.

Solving batched nonlinear trajectory optimization problems online for robotics MPC
Addressing computational demands for real-time batches of tens to hundreds of solves
Overcoming limitations in existing GPU-accelerated approaches for moderate batch sizes
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-accelerated batched trajectory optimization solver
Combines block-, warp-, and thread-level parallelism
Designed for real-time moderate batch size regimes
🔎 Similar Papers
No similar papers found.
A
Alexander Du
School of Engineering and Applied Science, Columbia University
E
Emre Adabag
School of Engineering and Applied Science, Columbia University; University of Michigan
G
Gabriel Bravo
Barnard College, Columbia University and Dartmouth College
Brian Plancher
Brian Plancher
Dartmouth College and Barnard College, Columbia University
RoboticsOptimizationComputer SystemsSTEM EducationEmbedded Machine Learning