π€ AI Summary
Deep reinforcement learning (DRL) agents for dynamic portfolio optimization suffer from poor interpretability and lack support for investor-specific, real-time risk intervention.
Method: This paper proposes a multi-critic DRL framework incorporating an interpretable reward factor matrix. It decomposes rewards at the asset level, introduces differentiable risk constraint terms, and enables fine-grained, asset-specific customization of risk aversion. Multiple critics jointly evaluate policy performance to enhance robustness.
Contribution/Results: Empirical evaluation across multiple markets demonstrates that the method significantly improves the Sharpe ratio (average +12.7%) and reduces maximum drawdown (average β18.3%). Crucially, it achieves transparency in decision-making by rendering reward attribution interpretable at the asset level and enabling real-time injection of personalized risk preferences. This work establishes a novel paradigm for explainable and controllable AI-driven financial advisory systems.
π Abstract
Typical deep reinforcement learning (DRL) agents for dynamic portfolio optimization learn the factors influencing portfolio return and risk by analyzing the output values of the reward function while adjusting portfolio weights within the training environment. However, it faces a major limitation where it is difficult for investors to intervene in the training based on different levels of risk aversion towards each portfolio asset. This difficulty arises from another limitation: existing DRL agents may not develop a thorough understanding of the factors responsible for the portfolio return and risk by only learning from the output of the reward function. As a result, the strategy for determining the target portfolio weights is entirely dependent on the DRL agents themselves. To address these limitations, we propose a reward factor matrix for elucidating the return and risk of each asset in the portfolio. Additionally, we propose a novel learning system named Factor-MCLS using a multi-critic framework that facilitates learning of the reward factor matrix. In this way, our DRL-based learning system can effectively learn the factors influencing portfolio return and risk. Moreover, based on the critic networks within the multi-critic framework, we develop a risk constraint term in the training objective function of the policy function. This risk constraint term allows investors to intervene in the training of the DRL agent according to their individual levels of risk aversion towards the portfolio assets.