Exploratory Randomization for Discrete-Time Risk-Sensitive Benchmarked Investment Management with Reinforcement Learning

📅 2026-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the optimal control problem in risk-sensitive portfolio management under both policy exploration and model parameter uncertainty. The authors formulate a discrete-time factor-based asset model and introduce an endogenous exploration mechanism via relative entropy-regularized Gaussian perturbations. By leveraging the free energy–entropy duality, the original problem is transformed into a linear-quadratic-Gaussian game. This approach establishes a theoretical bridge between risk-sensitive control and reinforcement learning, yielding concise optimality conditions and explicit exploration bounds. The resulting analytical solution reveals that the optimal policy takes a fractional Kelly form, with its exploration intensity jointly governed by the agent’s risk sensitivity, the covariance structure of asset returns, and the portfolio rebalancing frequency—thereby providing a theoretically grounded parametric family for policy gradient algorithms.

Technology Category

Application Category

📝 Abstract
This paper bridges reinforcement learning (RL) and risk-sensitive stochastic control by introducing a tractable exploration mechanism for policy search in risk-sensitive portfolio management, with known and unknown model parameters, that yields an endogenous relative-entropy regularization. We construct a discrete-time risk-sensitive benchmarked investment model. This model combines a factor-based asset universe with periodic portfolio rebalancing. Exploration is incorporated through user-specified Gaussian perturbations to baseline (exploitative) controls. The risk-sensitive stochastic control problem is solved analytically using the Free Energy-Entropy Duality. The Duality recasts the control problem as a linear-quadratic-Gaussian game and introduces a natural penalty for exploration. This approach yields simple sufficiency conditions for optimality. It also induces intuitive bounds on exploration based on risk sensitivity, asset covariance, and rebalancing frequency. Additionally, the optimal investment strategy can be interpreted through the lens of fractional Kelly strategies. By connecting risk-sensitive control theory and RL, this work provides a principled parametric family for policy-gradient implementations, guiding the design of RL methods.
Problem

Research questions and friction points this paper is trying to address.

risk-sensitive control
portfolio management
reinforcement learning
exploration
benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

risk-sensitive control
reinforcement learning
Free Energy-Entropy Duality
exploration-exploitation
fractional Kelly strategy
🔎 Similar Papers
No similar papers found.
S
Sebastien Lleo
Finance Department and ‘AI, Data Science & Business’ AE, NEOMA Business School, France
Wolfgang Runggaldier
Wolfgang Runggaldier
Professor emeritus at the University of Padova
Mathematical finance