🤖 AI Summary
Constructing profitable trading strategies under high stock market volatility remains challenging. Method: This paper proposes four novel profit-oriented loss functions, enabling end-to-end learning of long/short and buy/sell decisions in arbitrary neural networks—specifically integrated into Transformer-based time-series models (e.g., Crossformer)—by directly optimizing trading signals rather than price predictions. Results: Empirical evaluation on 50 S&P 500 constituents (2021–2023) shows that Crossformer with the optimal loss function achieves annualized returns of 51.42%, 51.04%, and 48.62%, significantly outperforming reinforcement learning baselines (PPO, DDPG) and a buy-and-hold strategy. Contribution: The work establishes the first profit-driven supervised trading paradigm, circumventing the training instability and sample inefficiency inherent in RL-based approaches, thereby introducing a new framework for financial time-series modeling.
📝 Abstract
Stock trading has always been a challenging task due to the highly volatile nature of the stock market. Making sound trading decisions to generate profit is particularly difficult under such conditions. To address this, we propose four novel loss functions to drive decision-making for a portfolio of stocks. These functions account for the potential profits or losses based with respect to buying or shorting respective stocks, enabling potentially any artificial neural network to directly learn an effective trading strategy. Despite the high volatility in stock market fluctuations over time, training time-series models such as transformers on these loss functions resulted in trading strategies that generated significant profits on a portfolio of 50 different S&P 500 company stocks as compared to a benchmark reinforcment learning techniques and a baseline buy and hold method. As an example, using 2021, 2022 and 2023 as three test periods, the Crossformer model adapted with our best loss function was most consistent, resulting in returns of 51.42%, 51.04% and 48.62% respectively. In comparison, the best performing state-of-the-art reinforcement learning methods, PPO and DDPG, only delivered maximum profits of around 41%, 2.81% and 41.58% for the same periods. The code is available at https://anonymous.4open.science/r/bandit-stock-trading-58C8/README.md.