The Exploratory Multi-Asset Mean-Variance Portfolio Selection using Reinforcement Learning

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the continuous-time multi-asset mean–variance (MV) portfolio optimization problem in time-varying financial markets. We propose an exploration-enhanced reinforcement learning framework based on the Soft Actor–Critic (SAC) algorithm. Our key contributions are threefold: (1) we introduce an exploratory Gaussian policy into SAC to improve policy robustness under market uncertainty; (2) we provide a theoretical convergence proof for the proposed policy iteration algorithm; and (3) we design a three-stage progressive parameter learning scheme that significantly enhances both stability and accuracy of multi-asset allocation. Extensive experiments—on both synthetic market environments and real-world financial data—demonstrate that our method consistently outperforms the classical MV solution and state-of-the-art RL baselines across critical metrics, including Sharpe ratio, volatility control, and terminal wealth distribution.

Technology Category

Application Category

📝 Abstract
In this paper, we study the continuous-time multi-asset mean-variance (MV) portfolio selection using a reinforcement learning (RL) algorithm, specifically the soft actor-critic (SAC) algorithm, in the time-varying financial market. A family of Gaussian portfolio selections is derived, and a policy iteration process is crafted to learn the optimal exploratory portfolio selection. We prove the convergence of the policy iteration process theoretically, based on which the SAC algorithm is developed. To improve the algorithm's stability and the learning accuracy in the multi-asset scenario, we divide the model parameters that influence the optimal portfolio selection into three parts, and learn each part progressively. Numerical studies in the simulated and real financial markets confirm the superior performance of the proposed SAC algorithm under various criteria.
Problem

Research questions and friction points this paper is trying to address.

Solving multi-asset mean-variance portfolio selection in time-varying markets
Developing RL-based policy iteration for optimal exploratory portfolios
Enhancing algorithm stability in multi-asset learning scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses soft actor-critic RL algorithm
Divides model parameters into three parts
Proves policy iteration convergence theoretically
🔎 Similar Papers
No similar papers found.