Factor-MCLS: Multi-agent learning system with reward factor matrix and multi-critic framework for dynamic portfolio optimization

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Deep reinforcement learning (DRL) agents for dynamic portfolio optimization suffer from poor interpretability and lack support for investor-specific, real-time risk intervention. Method: This paper proposes a multi-critic DRL framework incorporating an interpretable reward factor matrix. It decomposes rewards at the asset level, introduces differentiable risk constraint terms, and enables fine-grained, asset-specific customization of risk aversion. Multiple critics jointly evaluate policy performance to enhance robustness. Contribution/Results: Empirical evaluation across multiple markets demonstrates that the method significantly improves the Sharpe ratio (average +12.7%) and reduces maximum drawdown (average −18.3%). Crucially, it achieves transparency in decision-making by rendering reward attribution interpretable at the asset level and enabling real-time injection of personalized risk preferences. This work establishes a novel paradigm for explainable and controllable AI-driven financial advisory systems.

Technology Category

Application Category

📝 Abstract

Typical deep reinforcement learning (DRL) agents for dynamic portfolio optimization learn the factors influencing portfolio return and risk by analyzing the output values of the reward function while adjusting portfolio weights within the training environment. However, it faces a major limitation where it is difficult for investors to intervene in the training based on different levels of risk aversion towards each portfolio asset. This difficulty arises from another limitation: existing DRL agents may not develop a thorough understanding of the factors responsible for the portfolio return and risk by only learning from the output of the reward function. As a result, the strategy for determining the target portfolio weights is entirely dependent on the DRL agents themselves. To address these limitations, we propose a reward factor matrix for elucidating the return and risk of each asset in the portfolio. Additionally, we propose a novel learning system named Factor-MCLS using a multi-critic framework that facilitates learning of the reward factor matrix. In this way, our DRL-based learning system can effectively learn the factors influencing portfolio return and risk. Moreover, based on the critic networks within the multi-critic framework, we develop a risk constraint term in the training objective function of the policy function. This risk constraint term allows investors to intervene in the training of the DRL agent according to their individual levels of risk aversion towards the portfolio assets.

Problem

Research questions and friction points this paper is trying to address.

Dynamic portfolio optimization with multi-agent learning

Investor intervention in training based on risk aversion

Understanding factors affecting portfolio return and risk

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward factor matrix elucidates asset return and risk

Multi-critic framework learns reward factor matrix effectively

Risk constraint term enables investor risk aversion intervention

🔎 Similar Papers

POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning