Portfolio Reinforcement Learning with Scenario-Context Rollout

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This study addresses the performance degradation of portfolio rebalancing strategies under distributional shifts caused by regime changes in market mechanisms. To mitigate this issue, the authors propose the Scenario Context Unfolding (SCR) method, which generates multivariate return scenarios under stress conditions conditioned on macroeconomic states and constructs counterfactual next-state representations to correct the reward-transition mismatch in temporal difference learning. This approach stabilizes the training of reinforcement learning critics by enabling a controllable bias-variance trade-off under distributional shift. Evaluated on a testbed of 31 U.S. equity and ETF portfolios, SCR significantly enhances robustness, yielding up to a 76% improvement in Sharpe ratio and a reduction in maximum drawdown of up to 53%.

Technology Category

Application Category

📝 Abstract

Market regime shifts induce distribution shifts that can degrade the performance of portfolio rebalancing policies. We propose macro-conditioned scenario-context rollout (SCR) that generates plausible next-day multivariate return scenarios under stress events. However, doing so faces new challenges, as history will never tell what would have happened differently. As a result, incorporating scenario-based rewards from rollouts introduces a reward--transition mismatch in temporal-difference learning, destabilizing RL critic training. We analyze this inconsistency and show it leads to a mixed evaluation target. Guided by this analysis, we construct a counterfactual next state using the rollout-implied continuations and augment the critic agent's bootstrap target. Doing so stabilizes the learning and provides a viable bias-variance tradeoff. In out-of-sample evaluations across 31 distinct universes of U.S. equity and ETF portfolios, our method improves Sharpe ratio by up to 76% and reduces maximum drawdown by up to 53% compared with classic and RL-based portfolio rebalancing baselines.

Problem

Research questions and friction points this paper is trying to address.

market regime shifts

distribution shifts

reward-transition mismatch

portfolio rebalancing

temporal-difference learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

scenario-context rollout

reward-transition mismatch

counterfactual next state