Combined Representation and Generation with Diffusive State Predictive Information Bottleneck

📅 2025-10-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In molecular science, high-dimensional generative modeling faces challenges including data scarcity and difficulty capturing rare events. To address these, we propose the Diffusive State Predictive Information Bottleneck (D-SPIB)—the first framework unifying predictive information bottleneck principles with diffusion models under a tunable joint training scheme. D-SPIB integrates time-lagged feature extraction, variational information bottleneck regularization, multi-temperature trajectory inputs, and contrastive-reconstructive joint optimization to learn thermodynamically meaningful low-dimensional latent representations. Experiments demonstrate that D-SPIB significantly outperforms baselines in molecular conformation generation, kinetic feature capture, and cross-temperature extrapolation—achieving both high-fidelity generation and strong generalization. This work establishes a new physics-guided generative modeling paradigm for sparse-data regimes.

Technology Category

Application Category

📝 Abstract
Generative modeling becomes increasingly data-intensive in high-dimensional spaces. In molecular science, where data collection is expensive and important events are rare, compression to lower-dimensional manifolds is especially important for various downstream tasks, including generation. We combine a time-lagged information bottleneck designed to characterize molecular important representations and a diffusion model in one joint training objective. The resulting protocol, which we term Diffusive State Predictive Information Bottleneck (D-SPIB), enables the balancing of representation learning and generation aims in one flexible architecture. Additionally, the model is capable of combining temperature information from different molecular simulation trajectories to learn a coherent and useful internal representation of thermodynamics. We benchmark D-SPIB on multiple molecular tasks and showcase its potential for exploring physical conditions outside the training set.
Problem

Research questions and friction points this paper is trying to address.

Balancing representation learning and generation in molecular modeling
Compressing high-dimensional molecular data to lower-dimensional manifolds
Combining temperature information from different molecular simulation trajectories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining representation learning with diffusion model training
Integrating temperature data from multiple simulation trajectories
Balancing representation and generation in unified architecture
🔎 Similar Papers
No similar papers found.
Richard John
Richard John
Department of Physics, Institute for Physical Science and Technology, University of Maryland
Y
Yunrui Qiu
Institute for Physical Science and Technology, Institute for Health Computing, University of Maryland
L
Lukas Herron
Biophysics Program, Institute for Physical Science and Technology, Institute for Health Computing, University of Maryland
Pratyush Tiwary
Pratyush Tiwary
Millard and Lee Alexander Professor in Chemical Physics, University of Maryland, College Park
statistical mechanicsrare eventsmachine learning