🤖 AI Summary
This work addresses the challenges of unstable training, poor generalization, and limited interpretability commonly encountered by conventional deep reinforcement learning approaches in financial trading. To overcome these limitations, the authors propose Physics-Informed Kolmogorov–Arnold Networks (PIKANs), which uniquely integrate physics-informed regularization with the Kolmogorov–Arnold Network (KAN) architecture. By incorporating a second-order temporal consistency constraint derived from Newtonian mechanics to replace traditional multilayer perceptrons, PIKANs enhance policy stability, interpretability, and generalization. Empirical evaluations demonstrate that the proposed method consistently outperforms baseline models across Chinese, U.S., and Vietnamese markets, achieving superior performance in cumulative returns, annualized return, Sharpe ratio, and Calmar ratio, while significantly reducing maximum drawdown.
📝 Abstract
Deep Reinforcement Learning (DRL), a subset of machine learning focused on sequential decision-making, has emerged as a powerful approach for tackling financial trading problems. In finance, DRL is commonly used either to generate discrete trade signals or to determine continuous portfolio allocations. In this work, we propose a novel reinforcement learning framework for portfolio optimization that incorporates Physics-Informed Kolmogorov-Arnold Networks (PIKANs) into several DRL algorithms. The approach replaces conventional multilayer perceptrons with Kolmogorov-Arnold Networks (KANs) in both actor and critic components-utilizing learnable B-spline univariate functions to achieve parameter-efficient and more interpretable function approximation. During actor updates, we introduce a physics-informed regularization loss that promotes second-order temporal consistency between observed return dynamics and the action-induced portfolio adjustments. The proposed framework is evaluated across three equity markets-China, Vietnam, and the United States, covering both emerging and developed economies. Across all three markets, PIKAN-based agents consistently deliver higher cumulative and annualized returns, superior Sharpe and Calmar ratios, and more favorable drawdown characteristics compared to both standard DRL baselines and classical online portfolio-selection methods. This yields more stable training, higher Sharpe ratios, and superior performance compared to traditional DRL counterparts. The approach is particularly valuable in highly dynamic and noisy financial markets, where conventional DRL often suffers from instability and poor generalization.