The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning

📅 2024-11-30
🏛️ Neural Information Processing Systems
📈 Citations: 1
Influential: 1
📄 PDF
🤖 AI Summary
Existing physical simulation benchmarks suffer from limited scale and narrow domain coverage, hindering comprehensive evaluation of machine learning models on complex dynamical systems. To address this, we introduce the first 15-TB, multi-domain, high-fidelity physics simulation dataset—encompassing 16 spatiotemporal physical systems, including biological dynamics, fluid mechanics, acoustics, and cosmic magnetohydrodynamics. It is the first benchmark to systematically integrate cross-scale and interdisciplinary numerical simulation data. The dataset adopts standardized HDF5 storage and a unified PyTorch interface, enabling efficient distributed loading and spatiotemporal modeling. We fully open-source all code, data, and an extensible benchmarking framework. Empirical evaluation reveals significant performance bottlenecks of current ML surrogate models under strongly nonlinear, multiscale-coupled regimes. This dataset fills a critical gap in general-purpose evaluation infrastructure for complex physical systems and establishes foundational resources for physics-informed AI research.

Technology Category

Application Category

📝 Abstract
Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.
Problem

Research questions and friction points this paper is trying to address.

Provides diverse physics simulations for machine learning.
Addresses lack of large-scale datasets for evaluation.
Offers unified interface for training and evaluation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale diverse physics simulations
Unified PyTorch interface provided
Example baselines introduced
🔎 Similar Papers
No similar papers found.
Ruben Ohana
Ruben Ohana
Senior Research Scientist, NVIDIA
Machine LearningAI for ScienceComputer VisionOptical Computing
Michael McCabe
Michael McCabe
Flatiron Institute
Machine learningcomputational scienceoptimizationnumerical analysis
Lucas Meyer
Lucas Meyer
Polymathic AI
R
Rudy Morel
Polymathic AI, Flatiron Institute
F
Fruzsina J. Agocs
Flatiron Institute, University of Colorado, Boulder
M
M. Beneitez
University of Cambridge
Marsha Berger
Marsha Berger
Flatiron Institute, New York University
B
B. Burkhart
Flatiron Institute, Rutgers University
S
S. Dalziel
University of Cambridge
D
D. Fielding
Flatiron Institute, Cornell University
Daniel Fortunato
Daniel Fortunato
Flatiron Institute
J
Jared A. Goldberg
Flatiron Institute
Keiya Hirashima
Keiya Hirashima
RIEKN Center for Interdisciplinary Theoretical and Mathematical Sciences
Machine learningHPCGalaxy formation and evolution
Y
Yan-Fei Jiang
Flatiron Institute
R
R. Kerswell
University of Cambridge
S
S. Maddu
Flatiron Institute, University of Cambridge
J
Jonah Miller
Los Alamos National Laboratory
P
Payel Mukhopadhyay
University of California, Berkeley
S
Stefan S. Nixon
University of Cambridge
J
Jeff Shen
Princeton University
R
R. Watteaux
CEA DAM
Bruno Régaldo-Saint Blancard
Bruno Régaldo-Saint Blancard
Polymathic AI, Flatiron Institute
François Rozet
François Rozet
PhD student, University of Liège
deep learninggenerative modelingbayesian inferencephysics emulationgood shit
L
L. Parker
Polymathic AI, University of California, Berkeley
M
M. Cranmer
Polymathic AI, University of Cambridge
Shirley Ho
Shirley Ho
Flatiron Institute, Center for Computational Astrophysics
CosmologyAstrophysicsMachine LearningStatistics