An Actor-Critic Framework for Continuous-Time Jump-Diffusion Controls with Normalizing Flows

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses continuous-time stochastic control problems featuring explicit time dependence, jump shocks, and high-dimensional state spaces by proposing the first mesh-free Actor-Critic framework. Leveraging entropy regularization, time-varying small-q functions, and occupation measure theory, the method derives policy gradients applicable to non-homogeneous drift, volatility, and jump processes. A key innovation is the use of conditional normalizing flows to model non-Gaussian stochastic policies over continuous action spaces, enabling exact likelihood evaluation while enhancing representational flexibility. The approach is validated on time-varying linear-quadratic control, Merton portfolio optimization, and multi-agent games, demonstrating high-accuracy approximation of optimal policies, strong scalability with respect to both dimensionality and number of agents, and robust learning stability even in the presence of jump-induced discontinuities.

Technology Category

Application Category

📝 Abstract

Continuous-time stochastic control with time-inhomogeneous jump-diffusion dynamics is central in finance and economics, but computing optimal policies is difficult under explicit time dependence, discontinuous shocks, and high dimensionality. We propose an actor-critic framework that serves as a mesh-free solver for entropy-regularized control problems and stochastic games with jumps. The approach is built on a time-inhomogeneous little q-function and an appropriate occupation measure, yielding a policy-gradient representation that accommodates time-dependent drift, volatility, and jump terms. To represent expressive stochastic policies in continuous-action spaces, we parameterize the actor using conditional normalizing flows, enabling flexible non-Gaussian policies while retaining exact likelihood evaluation for entropy regularization and policy optimization. We validate the method on time-inhomogeneous linear-quadratic control, Merton portfolio optimization, and a multi-agent portfolio game, using explicit solutions or high-accuracy benchmarks. Numerical results demonstrate stable learning under jump discontinuities, accurate approximation of optimal stochastic policies, and favorable scaling with respect to dimension and number of agents.

Problem

Research questions and friction points this paper is trying to address.

continuous-time stochastic control

jump-diffusion processes

time-inhomogeneous dynamics

high dimensionality

optimal policy computation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Actor-Critic

Jump-Diffusion

Normalizing Flows