🤖 AI Summary
Aerosol–cloud–radiation interactions represent the largest source of uncertainty in climate modeling, primarily due to the high-dimensional, poorly constrained nature of aerosol states. To address this, we propose a physics-guided deep variational autoencoder (VAE) framework: (i) sliced Wasserstein distance is employed to dynamically optimize the KL divergence weight; (ii) a noise-robust preprocessing strategy is designed; and (iii) physically grounded constraints—including cloud condensation nuclei spectra, optical parameters, and ice nucleating particle activity—are explicitly embedded. The model compresses hundreds of dimensions of aerosol chemical mass and number size distributions into a 10-dimensional latent space while preserving physical consistency. It achieves high-fidelity reconstruction of key radiative and cloud microphysical properties. This compact, interpretable representation substantially reduces storage and computational overhead, enabling scalable, embeddable aerosol state emulation for large-scale climate simulations.
📝 Abstract
Aerosol-cloud--radiation interactions remain among the most uncertain components of the Earth's climate system, in partdue to the high dimensionality of aerosol state representations and the difficulty of obtaining complete extit{in situ} measurements. Addressing these challenges requires methods that distill complex aerosol properties into compact yet physically meaningful forms. Generative autoencoder models provide such a pathway. We present a framework for learning deep variational autoencoder (VAE) models of speciated mass and number concentration distributions, which capture detailed aerosol size-composition characteristics. By compressing hundreds of original dimensions into ten latent variables, the approach enables efficient storage and processing while preserving the fidelity of key diagnostics, including cloud condensation nuclei (CCN) spectra, optical scattering and absorption coefficients, and ice nucleation properties. Results show that CCN spectra are easiest to reconstruct accurately, optical properties are moderately difficult, and ice nucleation properties are the most challenging. To improve performance, we introduce a preprocessing optimization strategy that avoids repeated retraining and yields latent representations resilient to high-magnitude Gaussian noise, boosting accuracy for CCN spectra, optical coefficients, and frozen fraction spectra. Finally, we propose a novel realism metric -- based on the sliced Wasserstein distance between generated samples and a held-out test set -- for optimizing the KL divergence weight in VAEs. Together, these contributions enable compact, robust, and physically meaningful representations of aerosol states for large-scale climate applications.