🤖 AI Summary
Environmental knowledge discovery is hindered by the scarcity of high-quality labeled data and poor model generalization under atypical conditions. To address this, we propose A²SL—a framework integrating self-supervised learning, multi-level pairwise similarity modeling, and scene-aware selective data augmentation. A²SL dynamically retrieves semantically similar observations and selectively augments extreme or sparse scenarios to enhance ecological model robustness. Its key innovations are: (1) a transferable scene encoder enabling cross-condition semantic similarity measurement; (2) an augmentation-adaptive mechanism that prioritizes hard examples; and (3) a multi-level pairwise loss jointly optimizing representation learning and predictive performance. Evaluated on real-world lake water temperature and dissolved oxygen forecasting tasks, A²SL significantly outperforms state-of-the-art methods under data sparsity and distributional shift, reducing average prediction error by 18.7%. These results demonstrate its superior generalizability and practical utility for environmental modeling.
📝 Abstract
The discovery of environmental knowledge depends on labeled task-specific data, but is often constrained by the high cost of data collection. Existing machine learning approaches usually struggle to generalize in data-sparse or atypical conditions. To this end, we propose an Augmentation-Adaptive Self-Supervised Learning (A$^2$SL) framework, which retrieves relevant observational samples to enhance modeling of the target ecosystem. Specifically, we introduce a multi-level pairwise learning loss to train a scenario encoder that captures varying degrees of similarity among scenarios. These learned similarities drive a retrieval mechanism that supplements a target scenario with relevant data from different locations or time periods. Furthermore, to better handle variable scenarios, particularly under atypical or extreme conditions where traditional models struggle, we design an augmentation-adaptive mechanism that selectively enhances these scenarios through targeted data augmentation. Using freshwater ecosystems as a case study, we evaluate A$^2$SL in modeling water temperature and dissolved oxygen dynamics in real-world lakes. Experimental results show that A$^2$SL significantly improves predictive accuracy and enhances robustness in data-scarce and atypical scenarios. Although this study focuses on freshwater ecosystems, the A$^2$SL framework offers a broadly applicable solution in various scientific domains.