🤖 AI Summary
This study addresses the low accuracy and poor verifiability of large-scale static water table depth (WTD) modeling in data-scarce regions. We propose an XGBoost-based modeling framework that integrates hydrological physics constraints with multi-source proxy observations. A novel two-stage training paradigm is introduced, incorporating over 20 million ground-based and remotely sensed proxy WTD observations (at 500-m resolution) while embedding physically informed hydrological relationships to regularize model behavior. Evaluated pixel-wise across diverse ecoregions in North America, the resulting WTD model achieves R = 0.6–0.75—substantially outperforming state-of-the-art process-based models (R = 0.21–0.40). Crucially, this work is the first to systematically identify the dual constraints on model verifiability imposed by observational bias and algorithmic overfitting; it explicitly characterizes uncertainty sources—particularly in mountainous, data-poor areas—and proposes concrete pathways to enhance verifiability.
📝 Abstract
Spatial patterns of water table depth (WTD) play a crucial role in shaping ecological resilience, hydrological connectivity, and human-centric systems. Generally, a large-scale (e.g., continental or global) continuous map of static WTD can be simulated using either physically-based (PB) or machine learning-based (ML) models. We construct three fine-resolution (500 m) ML simulations of WTD, using the XGBoost algorithm and more than 20 million real and proxy observations of WTD, across the United States and Canada. The three ML models were constrained using known physical relations between WTD's drivers and WTD and were trained by sequentially adding real and proxy observations of WTD. Through an extensive (pixel-by-pixel) evaluation across the study region and within ten major ecoregions of North America, we demonstrate that our models (corr=0.6-0.75) can more accurately predict unseen real and proxy observations of WTD compared to two available PB simulations of WTD (corr=0.21-0.40). However, we still argue that currently-available large-scale simulations of static WTD could be uncertain within data-scarce regions such as steep mountainous regions. We reason that biased observational data mainly collected from low-elevation floodplains and the over-flexibility of available models can negatively affect the verifiability of large-scale simulations of WTD. Ultimately, we thoroughly discuss future directions that may help hydrogeologists decide how to improve machine learning-based WTD estimations. In particular, we advocate for the use of proxy satellite data, the incorporation of physical laws, the implementation of better model verification standards, the development of novel globally-available emergent indices, and the collection of more reliable observations.