Comparing Different Transformer Model Structures for Stock Prediction

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This study investigates the impact of Transformer architectural design choices on stock index forecasting performance, addressing a critical gap in systematic empirical evaluation of model architectures for financial time series. We systematically evaluate five Transformer variants—Encoder-only, Decoder-only, standard Encoder-Decoder, embedding-free, and ProbSparse—at daily frequency across multiple markets (e.g., CSI 300), using MAE and MSE for quantitative assessment. Our key findings are: (1) the Decoder-only Transformer consistently achieves 12.6–18.3% lower MAE than all other Transformer variants and significantly outperforms LSTM and ARIMA baselines; (2) ProbSparse attention degrades average forecasting accuracy by 9.7%, indicating its unsuitability for high-noise, low-periodicity financial time series. These results challenge the prevailing assumption of universal applicability for sparse attention mechanisms and provide empirically grounded guidance for architecture selection in financial forecasting.

Technology Category

Application Category

📝 Abstract

This paper compares different Transformer model architectures for stock index prediction. While many studies have shown that Transformers perform well in stock price forecasting, few have explored how different structural designs impact performance. Most existing works treat the Transformer as a black box, overlooking how specific architectural choices may affect predictive accuracy. However, understanding these differences is critical for developing more effective forecasting models. This study aims to identify which Transformer variant is most suitable for stock forecasting. This study evaluates five Transformer structures: (1) encoder-only Transformer, (2) decoder-only Transformer, (3) Vanilla Transformer (encoder + decoder), (4) Vanilla Transformer without embedding layers, and (5) Vanilla Transformer with ProbSparse attention. Results show that Transformer-based models generally outperform traditional approaches. Transformer with decoder only structure outperforms all other models in all scenarios. Transformer with ProbSparse attention has the worst performance in almost all cases.

Problem

Research questions and friction points this paper is trying to address.

Compare Transformer architectures for stock index prediction

Evaluate impact of structural designs on forecasting performance

Identify optimal Transformer variant for stock price prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares five Transformer architectures for stocks

Decoder-only Transformer performs best overall

ProbSparse attention has the worst performance

🔎 Similar Papers

No similar papers found.