🤖 AI Summary
Traditional physics-based and statistical models for water-level forecasting in the Florida Everglades suffer from high computational costs and poor generalizability. Method: This study systematically evaluates 12 task-specific models and 5 time-series foundation models (e.g., Chronos, TimesNet, PatchTST) on real-world wetland hydrological data. Contribution/Results: It provides the first empirical validation of time-series foundation models for complex, nonstationary wetland hydrology forecasting, revealing that Chronos significantly outperforms all baselines (23.6% lower MAE) and uncovering critical architecture–hydrodynamics alignment principles. Most foundation models underperform, while task-specific models exhibit strong architecture-dependent performance. We propose a multiscale error evaluation and attribution framework, establishing the first empirically grounded benchmark for environmental AI model selection tailored to wetland hydrology.
📝 Abstract
The Everglades play a crucial role in flood and drought regulation, water resource planning, and ecosystem management in the surrounding regions. However, traditional physics-based and statistical methods for predicting water levels often face significant challenges, including high computational costs and limited adaptability to diverse or unforeseen conditions. Recent advancements in large time series models have demonstrated the potential to address these limitations, with state-of-the-art deep learning and foundation models achieving remarkable success in time series forecasting across various domains. Despite this progress, their application to critical environmental systems, such as the Everglades, remains underexplored. In this study, we fill the gap by investigating twelve task-specific models and five time series foundation models across six categories for a real-world application focused on water level prediction in the Everglades. Our primary results show that the foundation model, Chronos, significantly outperforms all other models while the remaining foundation models exhibit relatively poor performance. Moreover, the performance of task-specific models varies with the model architectures. Lastly, we discuss the possible reasons for the varying performance of models.