🤖 AI Summary
This paper addresses continuous-time stochastic control under jump-diffusion financial processes—incorporating sudden, discontinuous risks—and proposes the first entropy-regularized exploratory reinforcement learning (RL) framework for this setting. Methodologically, it unifies the modeling of both diffusive and jump dynamics, enabling direct transfer of policy evaluation and Q-learning algorithms; theoretically, it shows that standard RL algorithms remain applicable without modification, requiring only jump-aware parameterization of the actor-critic architecture. Key contributions are: (1) the first extension of entropy-regularized exploratory control to jump-diffusion systems; (2) the discovery of “jump invariance” in mean-variance portfolio optimization and option hedging—i.e., optimal policies exhibit robustness to jump intensity; and (3) empirical validation of the framework’s transferability and effectiveness in high-frequency and event-driven financial scenarios, establishing a novel paradigm for intelligent financial decision-making.
📝 Abstract
We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the exploratory dynamics under jump-diffusions calls for a careful formulation of the jump part. Through a theoretical analysis, we find that one can simply use the same policy evaluation and $q$-learning algorithms in Jia and Zhou (2022a, 2023), originally developed for controlled diffusions, without needing to check a priori whether the underlying data come from a pure diffusion or a jump-diffusion. However, we show that the presence of jumps ought to affect parameterizations of actors and critics in general. We investigate as an application the mean--variance portfolio selection problem with stock price modelled as a jump-diffusion, and show that both RL algorithms and parameterizations are invariant with respect to jumps. Finally, we present a detailed study on applying the general theory to option hedging.