🤖 AI Summary
To address the challenges of labor-intensive, low-generalizability quality control (QC) for remote-sensor streamflow velocity data in continental-scale hydrological monitoring, this paper proposes the first foundational model for zero-shot transfer learning in velocity data QC. Our method introduces a novel “two-stage self-supervised pretraining + synthetic anomaly fine-tuning” paradigm, incorporating a TCN-Transformer hybrid temporal architecture and a sixth-order scaling-robust normalization mechanism to enable zero-shot cross-national and cross-magnitude generalization. Evaluated on 799 U.S. stations, the model achieves an F1-score of 0.792 and reduces reconstruction error by 68.7%. With zero-shot transfer to 100 Canadian stations—without any target-domain adaptation—it attains an F1-score of 0.586, significantly outperforming all baseline methods. This work establishes a scalable, highly generalizable foundational model paradigm for automated, real-time QC of large-scale hydrological data.
📝 Abstract
Real-time streamflow monitoring networks generate millions of observations annually, yet maintaining data quality across thousands of remote sensors remains labor-intensive. We introduce HydroGEM (Hydrological Generalizable Encoder for Monitoring), a foundation model for continental-scale streamflow quality control. HydroGEM uses two-stage training: self-supervised pretraining on 6.03 million sequences from 3,724 USGS stations learns hydrological representations, followed by fine-tuning with synthetic anomalies for detection and reconstruction. A hybrid TCN-Transformer architecture (14.2M parameters) captures local temporal patterns and long-range dependencies, while hierarchical normalization handles six orders of magnitude in discharge. On held-out synthetic tests comprising 799 stations with 18 expert-validated anomaly types, HydroGEM achieves F1 = 0.792 for detection and 68.7% reconstruction-error reduction, a 36.3% improvement over existing methods. Zero-shot transfer to 100 Environment and Climate Change Canada stations yields F1 = 0.586, exceeding all baselines and demonstrating cross-national generalization. The model maintains consistent detection across correction magnitudes and aligns with operational seasonal patterns. HydroGEM is designed for human-in-the-loop workflows - outputs are quality control suggestions requiring expert review, not autonomous corrections.