🤖 AI Summary
This study systematically evaluates the effectiveness of machine learning and econometric methods in forecasting government bond yield curves to address ongoing debates over model selection in financial time series modeling. Leveraging 47 years of daily U.S. Treasury data, it introduces TimeGPT, multiple forecasting-oriented Transformer architectures, and ensemble learning approaches to this task for the first time, benchmarking them comprehensively against ARIMA and its extensions, naive baselines, LightGBM (LGBM), and recurrent neural networks (RNNs). The results indicate that traditional econometric models—particularly ARIMA—generally achieve superior performance across most periods, while among machine learning approaches, TimeGPT, LGBM, and RNNs emerge as relatively more effective. The study also highlights the critical influence of data stationarity on the performance of deep learning models, offering empirical evidence and methodological guidance for yield curve forecasting.
📝 Abstract
While machine learning has revolutionized many fields such as natural language processing (NLP) and computer vision, its impact on time-series forecasting is still widely disputed, especially in the finance domain. This paper compares forecasting performance on U.S. Treasury yield curve data across econometrics/time-series analysis, classical machine learning, and deep learning methods, using daily data over 47 years. The Treasury yield curve is important because it is widely used by every participant in the bond markets, which are larger than equity markets. We examine a variety of methods that have not been tested on yield curve forecasting, especially deep learning algorithms. The algorithms include the Autoregressive Integrated Moving Average (ARIMA) model and its extensions, naive benchmarks, ensemble methods, Recurrent Neural Networks (RNNs), and multiple transformers built for forecasting. ARIMA and naive econometric models outperform other models overall, except in one time block. Of the machine learning methods, TimeGPT, LGBM and RNNs perform the best. Furthermore, the paper explores whether stationary or nonstationary data are more appropriate as input to deep learning models.