🤖 AI Summary
This work addresses the challenge of modeling equilibrium distributions and learning low-dimensional, interpretable representations for complex molecular systems. Methodologically, it introduces a novel framework that jointly optimizes the State-Prediction Information Bottleneck (SPIB) and Normalizing Flows in an end-to-end manner, enabling simultaneous disentanglement of slow collective variables, identification of metastable states, and temperature-conditional distribution generation. Variational inference is incorporated to enable thermodynamic reconstruction and temperature extrapolation from limited simulation data. Validated on RNA tetraloop systems, the method accurately reconstructs conformational ensembles and melting curves using only simulation data from two temperatures—achieving quantitative agreement with experimental measurements. Crucially, it overcomes the limitations of conventional collective-variable approaches that rely on prior knowledge or strong physical assumptions. The proposed framework establishes a unified, interpretable, thermodynamically consistent, and generalizable paradigm for molecular dynamics analysis.
📝 Abstract
Accurate characterization of the equilibrium distributions of complex molecular systems and their dependence on environmental factors such as temperature is essential for understanding thermodynamic properties and transition mechanisms. Projecting these distributions onto meaningful low-dimensional representations enables interpretability and downstream analysis. Recent advances in generative AI, particularly flow models such as Normalizing Flows (NFs), have shown promise in modeling such distributions, but their scope is limited without tailored representation learning. In this work, we introduce Latent Thermodynamic Flows (LaTF), an end-to-end framework that tightly integrates representation learning and generative modeling. LaTF unifies the State Predictive Information Bottleneck (SPIB) with NFs to simultaneously learn low-dimensional latent representations, referred to as Collective Variables (CVs), classify metastable states, and generate equilibrium distributions across temperatures beyond the training data. The two components of representation learning and generative modeling are optimized jointly, ensuring that the learned latent features capture the system's slow, important degrees of freedom while the generative model accurately reproduces the system's equilibrium behavior. We demonstrate LaTF's effectiveness across diverse systems, including a model potential, the Chignolin protein, and cluster of Lennard Jones particles, with thorough evaluations and benchmarking using multiple metrics and extensive simulations. Finally, we apply LaTF to a RNA tetraloop system, where despite using simulation data from only two temperatures, LaTF reconstructs the temperature-dependent structural ensemble and melting behavior, consistent with experimental and prior extensive computational results.