Machine Learning Workflows in Climate Modeling: Design Patterns and Insights from Case Studies

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses core challenges in applying machine learning (ML) to climate modeling—including physical inconsistency, difficulties in multi-scale coupling, data sparsity, poor generalization robustness, and low integration with scientific workflows—by proposing a synergistic ML workflow framework that jointly leverages physical priors and simulation data. Methodologically, it integrates surrogate modeling, ML-driven parameterization, probabilistic programming, simulation-based inference, and physics-informed transfer learning, prioritizing interpretability and scientific rigor. Key contributions include: (i) distilling cross-task workflow design patterns; (ii) establishing a scientific ML practice framework that supports transparent development, critical evaluation, and reliable integration; and (iii) significantly enhancing model trustworthiness and interdisciplinary reproducibility while lowering technical barriers to deep integration between data science and climate modeling.

Technology Category

Application Category

📝 Abstract
Machine learning has been increasingly applied in climate modeling on system emulation acceleration, data-driven parameter inference, forecasting, and knowledge discovery, addressing challenges such as physical consistency, multi-scale coupling, data sparsity, robust generalization, and integration with scientific workflows. This paper analyzes a series of case studies from applied machine learning research in climate modeling, with a focus on design choices and workflow structure. Rather than reviewing technical details, we aim to synthesize workflow design patterns across diverse projects in ML-enabled climate modeling: from surrogate modeling, ML parameterization, probabilistic programming, to simulation-based inference, and physics-informed transfer learning. We unpack how these workflows are grounded in physical knowledge, informed by simulation data, and designed to integrate observations. We aim to offer a framework for ensuring rigor in scientific machine learning through more transparent model development, critical evaluation, informed adaptation, and reproducibility, and to contribute to lowering the barrier for interdisciplinary collaboration at the interface of data science and climate modeling.
Problem

Research questions and friction points this paper is trying to address.

Developing ML workflows for climate modeling applications
Ensuring physical consistency in machine learning climate models
Integrating scientific workflows with data-driven climate approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Surrogate modeling for system emulation acceleration
Physics-informed transfer learning for robust generalization
Simulation-based inference integrating observational data
T
Tian Zheng
Department of Statistics, Columbia University, New York, New York; NSF STC Learning the Earth with AI and Physics (LEAP), New York, New York
S
Subashree Venkatasubramanian
NSF STC Learning the Earth with AI and Physics (LEAP), New York, New York
S
Shuolin Li
NSF STC Learning the Earth with AI and Physics (LEAP), New York, New York; Data Science Institute, Columbia University, New York, New York
A
Amy Braverman
Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California
X
Xinyi Ke
Department of Statistics, Columbia University, New York, New York; NSF STC Learning the Earth with AI and Physics (LEAP), New York, New York
Z
Zhewen Hou
Department of Statistics, Columbia University, New York, New York
Peter Jin
Peter Jin
UC Berkeley
Machine LearningArtificial Intelligence
S
Samarth Sanjay Agrawal
NSF STC Learning the Earth with AI and Physics (LEAP), New York, New York