🤖 AI Summary
Standard state normalization techniques in reinforcement learning (RL) for portfolio optimization discard absolute asset value information—such as nominal prices and market capitalizations—leading to substantial performance degradation in non-cryptocurrency markets (e.g., IBOVESPA, NYSE). Method: The study systematically compares two prevalent normalization approaches across three heterogeneous financial markets, analyzing their impact on numerical stability, economic interpretability, and generalization. Contribution/Results: It demonstrates that conventional preprocessing, while improving numerical conditioning, impairs the agent’s ability to perceive absolute economic magnitudes, thereby reducing risk-adjusted returns and cross-market transferability. The work advocates preserving interpretable, dimensionally consistent economic quantities—rather than applying blind standardization—in state representation design. Empirical results show that eliminating or reengineering normalization improves annualized returns by 12–28% and enhances strategy robustness, challenging the prevailing assumption in RL-based finance that normalization is universally beneficial.
📝 Abstract
Recently, reinforcement learning has achieved remarkable results in various domains, including robotics, games, natural language processing, and finance. In the financial domain, this approach has been applied to tasks such as portfolio optimization, where an agent continuously adjusts the allocation of assets within a financial portfolio to maximize profit. Numerous studies have introduced new simulation environments, neural network architectures, and training algorithms for this purpose. Among these, a domain-specific policy gradient algorithm has gained significant attention in the research community for being lightweight, fast, and for outperforming other approaches. However, recent studies have shown that this algorithm can yield inconsistent results and underperform, especially when the portfolio does not consist of cryptocurrencies. One possible explanation for this issue is that the commonly used state normalization method may cause the agent to lose critical information about the true value of the assets being traded. This paper explores this hypothesis by evaluating two of the most widely used normalization methods across three different markets (IBOVESPA, NYSE, and cryptocurrencies) and comparing them with the standard practice of normalizing data before training. The results indicate that, in this specific domain, the state normalization can indeed degrade the agent's performance.