🤖 AI Summary
This work addresses the instability of updates and the difficulty in handling non-differentiable or discontinuous cost functions in stochastic trajectory optimization by reinterpreting the STOMP algorithm from a variational inference perspective. It introduces a proximal inference framework that incorporates a KL divergence regularizer between successive Gaussian proposal distributions into the objective, yielding a closed-form mean update with a trust-region interpretation. The approach leverages importance-weighted Monte Carlo sampling to estimate expectations, enabling compatibility with arbitrary cost functions without requiring gradients. Experiments demonstrate that the method achieves an 89% success rate in robotic arm planning—outperforming CHOMP (63%) and STOMP (68%)—while producing shorter, smoother trajectories and operating at twice the speed. It also surpasses CEM and MPPI in reward performance on contact-intensive MuJoCo tasks.
📝 Abstract
Stochastic trajectory optimization methods like STOMP enable planning with non-differentiable costs, offering substantial flexibility over gradient-based approaches. We show that STOMP implicitly minimizes the KL divergence from a Boltzmann trajectory distribution, revealing an elegant Variational Inference (VI) structure underlying its updates. Building on this insight, we propose the \textit{Proximal Inference for Stochastic Trajectory Optimization} (PISTO) algorithm that stabilizes the updates by augmenting the objective with a KL regularization between successive Gaussian proposals. This proximal formulation admits a trust-region interpretation and yields closed-form mean updates computable as expectations under a surrogate distribution. We estimate these expectations via importance-weighted Monte Carlo sampling, producing a simple, derivative-free algorithm that inherits STOMP's ability to handle non-differentiable and discontinuous costs without modification. On robot arm motion planning benchmarks, PISTO achieves an 89\% success rate -- outperforming CHOMP (63\%) and STOMP (68\%) -- while producing shorter, smoother paths at twice the speed of competing stochastic methods. We further validate PISTO on contact-rich MuJoCo locomotion and manipulation tasks, where it consistently outperforms both CEM and MPPI baselines in reward.