🤖 AI Summary
This work addresses continuous-time stochastic control problems featuring explicit time dependence, jump shocks, and high-dimensional state spaces by proposing the first mesh-free Actor-Critic framework. Leveraging entropy regularization, time-varying small-q functions, and occupation measure theory, the method derives policy gradients applicable to non-homogeneous drift, volatility, and jump processes. A key innovation is the use of conditional normalizing flows to model non-Gaussian stochastic policies over continuous action spaces, enabling exact likelihood evaluation while enhancing representational flexibility. The approach is validated on time-varying linear-quadratic control, Merton portfolio optimization, and multi-agent games, demonstrating high-accuracy approximation of optimal policies, strong scalability with respect to both dimensionality and number of agents, and robust learning stability even in the presence of jump-induced discontinuities.
📝 Abstract
Continuous-time stochastic control with time-inhomogeneous jump-diffusion dynamics is central in finance and economics, but computing optimal policies is difficult under explicit time dependence, discontinuous shocks, and high dimensionality. We propose an actor-critic framework that serves as a mesh-free solver for entropy-regularized control problems and stochastic games with jumps. The approach is built on a time-inhomogeneous little q-function and an appropriate occupation measure, yielding a policy-gradient representation that accommodates time-dependent drift, volatility, and jump terms. To represent expressive stochastic policies in continuous-action spaces, we parameterize the actor using conditional normalizing flows, enabling flexible non-Gaussian policies while retaining exact likelihood evaluation for entropy regularization and policy optimization. We validate the method on time-inhomogeneous linear-quadratic control, Merton portfolio optimization, and a multi-agent portfolio game, using explicit solutions or high-accuracy benchmarks. Numerical results demonstrate stable learning under jump discontinuities, accurate approximation of optimal stochastic policies, and favorable scaling with respect to dimension and number of agents.