🤖 AI Summary
Hydrological time-series modeling faces challenges in identifying critical features and ensuring physical interpretability. Method: We systematically evaluate over 20 state-of-the-art models—including LSTM, TCN, and Transformer—on >8,000 global catchments from the CAMELS and Caravan datasets, and propose an exogenous-information-driven modeling paradigm. We identify natural annual periodic exogenous variables (e.g., monthly mean temperature, solar radiation) as most impactful for rainfall–runoff prediction, and design a multi-configuration exogenous fusion strategy that unifies static catchment attributes and dynamic meteorological forcings. Contribution/Results: Integrating exogenous information reduces MSE by up to 40% on the largest dataset. We open-source a Jupyter/Colab-based framework, standardized preprocessing pipelines, and full implementation code—significantly improving reproducibility and scientific utility.
📝 Abstract
There has been active investigation into deep learning approaches for time series analysis, including foundation models. However, most studies do not address significant scientific applications. This paper aims to identify key features in time series by examining hydrology data. Our work advances computer science by emphasizing critical application features and contributes to hydrology and other scientific fields by identifying modeling approaches that effectively capture these features. Scientific time series data are inherently complex, involving observations from multiple locations, each with various time-dependent data streams and exogenous factors that may be static or time-varying and either application-dependent or purely mathematical. This research analyzes hydrology time series from the CAMELS and Caravan global datasets, which encompass rainfall and runoff data across catchments, featuring up to six observed streams and 209 static parameters across approximately 8,000 locations. Our investigation assesses the impact of exogenous data through eight different model configurations for key hydrology tasks. Results demonstrate that integrating exogenous information enhances data representation, reducing mean squared error by up to 40% in the largest dataset. Additionally, we present a detailed performance comparison of over 20 state-of-the-art pattern and foundation models. The analysis is fully open-source, facilitated by Jupyter Notebook on Google Colab for LSTM-based modeling, data preprocessing, and model comparisons. Preliminary findings using alternative deep learning architectures reveal that models incorporating comprehensive observed and exogenous data outperform more limited approaches, including foundation models. Notably, natural annual periodic exogenous time series contribute the most significant improvements, though static and other periodic factors are also valuable.