Portfolio Reinforcement Learning with Scenario-Context Rollout

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the performance degradation of portfolio rebalancing strategies under distributional shifts caused by regime changes in market mechanisms. To mitigate this issue, the authors propose the Scenario Context Unfolding (SCR) method, which generates multivariate return scenarios under stress conditions conditioned on macroeconomic states and constructs counterfactual next-state representations to correct the reward-transition mismatch in temporal difference learning. This approach stabilizes the training of reinforcement learning critics by enabling a controllable bias-variance trade-off under distributional shift. Evaluated on a testbed of 31 U.S. equity and ETF portfolios, SCR significantly enhances robustness, yielding up to a 76% improvement in Sharpe ratio and a reduction in maximum drawdown of up to 53%.

Technology Category

Application Category

📝 Abstract
Market regime shifts induce distribution shifts that can degrade the performance of portfolio rebalancing policies. We propose macro-conditioned scenario-context rollout (SCR) that generates plausible next-day multivariate return scenarios under stress events. However, doing so faces new challenges, as history will never tell what would have happened differently. As a result, incorporating scenario-based rewards from rollouts introduces a reward--transition mismatch in temporal-difference learning, destabilizing RL critic training. We analyze this inconsistency and show it leads to a mixed evaluation target. Guided by this analysis, we construct a counterfactual next state using the rollout-implied continuations and augment the critic agent's bootstrap target. Doing so stabilizes the learning and provides a viable bias-variance tradeoff. In out-of-sample evaluations across 31 distinct universes of U.S. equity and ETF portfolios, our method improves Sharpe ratio by up to 76% and reduces maximum drawdown by up to 53% compared with classic and RL-based portfolio rebalancing baselines.
Problem

Research questions and friction points this paper is trying to address.

market regime shifts
distribution shifts
reward-transition mismatch
portfolio rebalancing
temporal-difference learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

scenario-context rollout
reward-transition mismatch
counterfactual next state
portfolio reinforcement learning
market regime shifts
🔎 Similar Papers
No similar papers found.
V
Vanya Priscillia Bendatu
National University of Singapore
Yao Lu
Yao Lu
National University of Singapore
AI systems