🤖 AI Summary
This study addresses the performance degradation of portfolio rebalancing strategies under distributional shifts caused by regime changes in market mechanisms. To mitigate this issue, the authors propose the Scenario Context Unfolding (SCR) method, which generates multivariate return scenarios under stress conditions conditioned on macroeconomic states and constructs counterfactual next-state representations to correct the reward-transition mismatch in temporal difference learning. This approach stabilizes the training of reinforcement learning critics by enabling a controllable bias-variance trade-off under distributional shift. Evaluated on a testbed of 31 U.S. equity and ETF portfolios, SCR significantly enhances robustness, yielding up to a 76% improvement in Sharpe ratio and a reduction in maximum drawdown of up to 53%.
📝 Abstract
Market regime shifts induce distribution shifts that can degrade the performance of portfolio rebalancing policies. We propose macro-conditioned scenario-context rollout (SCR) that generates plausible next-day multivariate return scenarios under stress events. However, doing so faces new challenges, as history will never tell what would have happened differently. As a result, incorporating scenario-based rewards from rollouts introduces a reward--transition mismatch in temporal-difference learning, destabilizing RL critic training. We analyze this inconsistency and show it leads to a mixed evaluation target. Guided by this analysis, we construct a counterfactual next state using the rollout-implied continuations and augment the critic agent's bootstrap target. Doing so stabilizes the learning and provides a viable bias-variance tradeoff. In out-of-sample evaluations across 31 distinct universes of U.S. equity and ETF portfolios, our method improves Sharpe ratio by up to 76% and reduces maximum drawdown by up to 53% compared with classic and RL-based portfolio rebalancing baselines.