Regret-Optimized Portfolio Enhancement through Deep Reinforcement Learning and Future Looking Rewards

📅 2025-02-04

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This study addresses the performance limitations of the conventional 60/40 stock-bond portfolio under realistic transaction costs and noisy market signals. Methodologically, we propose a dynamic optimization framework based on deep reinforcement learning, employing the Proximal Policy Optimization (PPO) algorithm. We introduce a novel “regret-aware forward-looking reward function” that optimizes for Sharpe ratio regret; design a transaction-cost-adaptive scheduling mechanism; and generate synthetic time-series data via Oracle-guided training combined with circular block bootstrapping. Robustness is further enhanced through multi-agent average policy evaluation. Empirical results across 20 independent agents demonstrate statistically significant outperformance over both the 60/40 benchmark and leading baseline strategies—achieving simultaneous improvements in annualized return and maximum drawdown. The framework exhibits strong generalization capability and practical deployability in real-world portfolio management.

Technology Category

Application Category

📝 Abstract

This paper introduces a novel agent-based approach for enhancing existing portfolio strategies using Proximal Policy Optimization (PPO). Rather than focusing solely on traditional portfolio construction, our approach aims to improve an already high-performing strategy through dynamic rebalancing driven by PPO and Oracle agents. Our target is to enhance the traditional 60/40 benchmark (60% stocks, 40% bonds) by employing the Regret-based Sharpe reward function. To address the impact of transaction fee frictions and prevent signal loss, we develop a transaction cost scheduler. We introduce a future-looking reward function and employ synthetic data training through a circular block bootstrap method to facilitate the learning of generalizable allocation strategies. We focus on two key evaluation measures: return and maximum drawdown. Given the high stochasticity of financial markets, we train 20 independent agents each period and evaluate their average performance against the benchmark. Our method not only enhances the performance of the existing portfolio strategy through strategic rebalancing but also demonstrates strong results compared to other baselines.

Problem

Research questions and friction points this paper is trying to address.

Enhancing portfolio strategies using PPO

Dynamic rebalancing with regret-based rewards

Mitigating transaction fees and signal loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proximal Policy Optimization agents

Regret-based Sharpe reward

transaction cost scheduler

🔎 Similar Papers

No similar papers found.