Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
This work addresses the low sampling efficiency in trajectory and policy optimization when the target distribution is sharp or multimodal. It formulates controller design as an inference problem by minimizing the expected trajectory cost under a KL-regularized objective, yielding a Boltzmann-shaped distribution that increasingly concentrates on low-cost solutions as the temperature decreases. The authors propose a Temperature-controlled Sequential Monte Carlo (TSMC) framework that integrates adaptive reweighting, resampling, and Hamiltonian Monte Carlo updates, and extend it to policy optimization to account for initial state distributions and trajectory stochasticity. Experimental results demonstrate that the method significantly outperforms existing state-of-the-art approaches across multiple trajectory and policy optimization benchmarks, exhibiting both strong generality and high efficiency.

Technology Category

Application Category

📝 Abstract
We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal "Boltzmann-tilted" distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we introduce tempered sequential Monte Carlo (TSMC): an annealing scheme that adaptively reweights and resamples particles along a tempering path from a prior to the target distribution, while using Hamiltonian Monte Carlo rejuvenation to maintain diversity and exploit exact gradients obtained by differentiating through trajectory rollouts. For policy optimization, we extend TSMC via (i) a deterministic empirical approximation of the initial-state distribution and (ii) an extended-space construction that treats rollout randomness as auxiliary variables. Experiments across trajectory- and policy-optimization benchmarks show that TSMC is broadly applicable and compares favorably to state-of-the-art baselines.
Problem

Research questions and friction points this paper is trying to address.

trajectory optimization
policy optimization
differentiable dynamics
sequential Monte Carlo
finite-horizon control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tempered Sequential Monte Carlo
Differentiable Dynamics
Trajectory Optimization
Policy Optimization
Hamiltonian Monte Carlo