AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This study addresses the performance degradation of data-driven models when transferring from reanalysis data to operational numerical weather prediction by proposing a two-stage transfer learning strategy. First, a deterministic LSTM model is pretrained on ERA5-Land (1980–2019), then fine-tuned on IFS control forecasts (2016–2019) to align with their error structure. This approach enables the first end-to-end global daily streamflow forecasting within the CARAVAN ecosystem, effectively bridging the domain gap between reanalysis and operational forecasts. Evaluated on an independent test set (2021–2024), the model achieves a median KGE' of 0.66 and NSE of 0.53, demonstrates reliable skill in capturing extreme events, and matches the performance of current state-of-the-art global systems—all within a transparent and reproducible framework.

Technology Category

Application Category

📝 Abstract

Reliable global streamflow forecasting is essential for flood preparedness and water resource management, yet data-driven models often suffer from a performance gap when transitioning from historical reanalysis to operational forecast products. This paper introduces AIFL (Artificial Intelligence for Floods), a deterministic LSTM-based model designed for global daily streamflow forecasting. Trained on 18,588 basins curated from the CARAVAN dataset, AIFL utilises a novel two-stage training strategy to bridge the reanalysis-to-forecast domain shift. The model is first pre-trained on 40 years of ERA5-Land reanalysis (1980-2019) to capture robust hydrological processes, then fine-tuned on operational Integrated Forecasting System (IFS) control forecasts (2016-2019) to adapt to the specific error structures and biases of operational numerical weather prediction. To our knowledge, this is the first global model trained end-to-end within the CARAVAN ecosystem. On an independent temporal test set (2021-2024), AIFL achieves high predictive skill with a median modified Kling-Gupta Efficiency (KGE') of 0.66 and a median Nash-Sutcliffe Efficiency (NSE) of 0.53. Benchmarking results show that AIFL is highly competitive with current state-of-the-art global systems, achieving comparable accuracy while maintaining a transparent and reproducible forcing pipeline. The model demonstrates exceptional reliability in extreme-event detection, providing a streamlined and operationally robust baseline for the global hydrological community.

Problem

Research questions and friction points this paper is trying to address.

streamflow forecasting

domain shift

global hydrological modeling

flood preparedness

data-driven models

Innovation

Methods, ideas, or system contributions that make the work stand out.

two-stage training

deterministic LSTM

ERA5-Land pre-training