🤖 AI Summary
Existing physical simulation benchmarks suffer from limited scale and narrow domain coverage, hindering comprehensive evaluation of machine learning models on complex dynamical systems. To address this, we introduce the first 15-TB, multi-domain, high-fidelity physics simulation dataset—encompassing 16 spatiotemporal physical systems, including biological dynamics, fluid mechanics, acoustics, and cosmic magnetohydrodynamics. It is the first benchmark to systematically integrate cross-scale and interdisciplinary numerical simulation data. The dataset adopts standardized HDF5 storage and a unified PyTorch interface, enabling efficient distributed loading and spatiotemporal modeling. We fully open-source all code, data, and an extensible benchmarking framework. Empirical evaluation reveals significant performance bottlenecks of current ML surrogate models under strongly nonlinear, multiscale-coupled regimes. This dataset fills a critical gap in general-purpose evaluation infrastructure for complex physical systems and establishes foundational resources for physics-informed AI research.
📝 Abstract
Machine learning based surrogate models offer researchers powerful tools for accelerating simulation-based workflows. However, as standard datasets in this space often cover small classes of physical behavior, it can be difficult to evaluate the efficacy of new approaches. To address this gap, we introduce the Well: a large-scale collection of datasets containing numerical simulations of a wide variety of spatiotemporal physical systems. The Well draws from domain experts and numerical software developers to provide 15TB of data across 16 datasets covering diverse domains such as biological systems, fluid dynamics, acoustic scattering, as well as magneto-hydrodynamic simulations of extra-galactic fluids or supernova explosions. These datasets can be used individually or as part of a broader benchmark suite. To facilitate usage of the Well, we provide a unified PyTorch interface for training and evaluating models. We demonstrate the function of this library by introducing example baselines that highlight the new challenges posed by the complex dynamics of the Well. The code and data is available at https://github.com/PolymathicAI/the_well.