HydroGEM: A Self Supervised Zero Shot Hybrid TCN Transformer Foundation Model for Continental Scale Streamflow Quality Control

📅 2025-12-16

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

To address the challenges of labor-intensive, low-generalizability quality control (QC) for remote-sensor streamflow velocity data in continental-scale hydrological monitoring, this paper proposes the first foundational model for zero-shot transfer learning in velocity data QC. Our method introduces a novel “two-stage self-supervised pretraining + synthetic anomaly fine-tuning” paradigm, incorporating a TCN-Transformer hybrid temporal architecture and a sixth-order scaling-robust normalization mechanism to enable zero-shot cross-national and cross-magnitude generalization. Evaluated on 799 U.S. stations, the model achieves an F1-score of 0.792 and reduces reconstruction error by 68.7%. With zero-shot transfer to 100 Canadian stations—without any target-domain adaptation—it attains an F1-score of 0.586, significantly outperforming all baseline methods. This work establishes a scalable, highly generalizable foundational model paradigm for automated, real-time QC of large-scale hydrological data.

Technology Category

Application Category

📝 Abstract

Real-time streamflow monitoring networks generate millions of observations annually, yet maintaining data quality across thousands of remote sensors remains labor-intensive. We introduce HydroGEM (Hydrological Generalizable Encoder for Monitoring), a foundation model for continental-scale streamflow quality control. HydroGEM uses two-stage training: self-supervised pretraining on 6.03 million sequences from 3,724 USGS stations learns hydrological representations, followed by fine-tuning with synthetic anomalies for detection and reconstruction. A hybrid TCN-Transformer architecture (14.2M parameters) captures local temporal patterns and long-range dependencies, while hierarchical normalization handles six orders of magnitude in discharge. On held-out synthetic tests comprising 799 stations with 18 expert-validated anomaly types, HydroGEM achieves F1 = 0.792 for detection and 68.7% reconstruction-error reduction, a 36.3% improvement over existing methods. Zero-shot transfer to 100 Environment and Climate Change Canada stations yields F1 = 0.586, exceeding all baselines and demonstrating cross-national generalization. The model maintains consistent detection across correction magnitudes and aligns with operational seasonal patterns. HydroGEM is designed for human-in-the-loop workflows - outputs are quality control suggestions requiring expert review, not autonomous corrections.

Problem

Research questions and friction points this paper is trying to address.

Develops a foundation model for continental-scale streamflow data quality control

Uses self-supervised learning and hybrid architecture to detect and reconstruct anomalies

Enables zero-shot transfer and human-in-the-loop workflows for operational use

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid TCN-Transformer architecture captures temporal patterns

Self-supervised pretraining learns hydrological representations from sequences

Hierarchical normalization handles wide discharge magnitude ranges

🔎 Similar Papers

No similar papers found.