Comparing Different Transformer Model Structures for Stock Prediction

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the impact of Transformer architectural design choices on stock index forecasting performance, addressing a critical gap in systematic empirical evaluation of model architectures for financial time series. We systematically evaluate five Transformer variants—Encoder-only, Decoder-only, standard Encoder-Decoder, embedding-free, and ProbSparse—at daily frequency across multiple markets (e.g., CSI 300), using MAE and MSE for quantitative assessment. Our key findings are: (1) the Decoder-only Transformer consistently achieves 12.6–18.3% lower MAE than all other Transformer variants and significantly outperforms LSTM and ARIMA baselines; (2) ProbSparse attention degrades average forecasting accuracy by 9.7%, indicating its unsuitability for high-noise, low-periodicity financial time series. These results challenge the prevailing assumption of universal applicability for sparse attention mechanisms and provide empirically grounded guidance for architecture selection in financial forecasting.

Technology Category

Application Category

📝 Abstract
This paper compares different Transformer model architectures for stock index prediction. While many studies have shown that Transformers perform well in stock price forecasting, few have explored how different structural designs impact performance. Most existing works treat the Transformer as a black box, overlooking how specific architectural choices may affect predictive accuracy. However, understanding these differences is critical for developing more effective forecasting models. This study aims to identify which Transformer variant is most suitable for stock forecasting. This study evaluates five Transformer structures: (1) encoder-only Transformer, (2) decoder-only Transformer, (3) Vanilla Transformer (encoder + decoder), (4) Vanilla Transformer without embedding layers, and (5) Vanilla Transformer with ProbSparse attention. Results show that Transformer-based models generally outperform traditional approaches. Transformer with decoder only structure outperforms all other models in all scenarios. Transformer with ProbSparse attention has the worst performance in almost all cases.
Problem

Research questions and friction points this paper is trying to address.

Compare Transformer architectures for stock index prediction
Evaluate impact of structural designs on forecasting performance
Identify optimal Transformer variant for stock price prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares five Transformer architectures for stocks
Decoder-only Transformer performs best overall
ProbSparse attention has the worst performance
🔎 Similar Papers
No similar papers found.