An Efficient On-Policy Deep Learning Framework for Stochastic Optimal Control

📅 2024-10-07
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and memory overhead in high-dimensional, long-horizon stochastic optimal control (SOC), this paper proposes a backpropagation-free online policy deep learning framework. Methodologically, it introduces the Girsanov theorem into policy gradient estimation for the first time, enabling analytical gradient derivation directly from sample trajectories—thereby avoiding numerical integration of stochastic differential equations and their adjoint systems. The approach integrates Schrödinger–Föllmer process modeling with diffusion model fine-tuning, preserving theoretical consistency while enhancing scalability. Experiments on canonical SOC benchmarks and distribution generation tasks demonstrate several-fold speedup in training, substantial GPU memory efficiency gains, and successful deployment in state spaces up to 1,000 dimensions. This work establishes a computationally efficient, lightweight paradigm for large-scale stochastic control.

Technology Category

Application Category

📝 Abstract
We present a novel on-policy algorithm for solving stochastic optimal control (SOC) problems. By leveraging the Girsanov theorem, our method directly computes on-policy gradients of the SOC objective without expensive backpropagation through stochastic differential equations or adjoint problem solutions. This approach significantly accelerates the optimization of neural network control policies while scaling efficiently to high-dimensional problems and long time horizons. We evaluate our method on classical SOC benchmarks as well as applications to sampling from unnormalized distributions via Schr""odinger-F""ollmer processes and fine-tuning pre-trained diffusion models. Experimental results demonstrate substantial improvements in both computational speed and memory efficiency compared to existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Efficient on-policy algorithm for stochastic optimal control
Avoids expensive backpropagation in stochastic differential equations
Accelerates neural network control policy optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

On-policy algorithm for stochastic optimal control
Girsanov theorem for gradient computation
Efficient neural network policy optimization
🔎 Similar Papers
No similar papers found.
Mengjian Hua
Mengjian Hua
Courant Institute of Mathematical Sciences, New York University
M
Matthieu Lauriere
Shanghai Frontiers Science Center of AI and DL, NYU-ECNU Institute of Mathematical Sciences, NYU Shanghai, Shanghai, China 200124
Eric Vanden-Eijnden
Eric Vanden-Eijnden
Courant Institute of Mathematical Sciences NYU
Applied and computational mathematics