MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Molecular dynamics (MD) simulations provide mechanistic insights into protein function but are hampered by prohibitive computational cost and insufficient sampling of long-timescale conformational transitions. Existing generative models typically learn transfer densities at fixed, short lag times, rendering them susceptible to dominance by high-frequency, low-information local transitions and limiting generalizability. To address this, we propose MSM-Flow—a novel generative simulator that synergistically integrates Markov State Models (MSMs) with Flow Matching. Its key innovation lies in directly modeling the *transition sampling distribution* between discrete metastable states within a learned latent space, thereby bypassing reliance on short-lag transfer statistics. Evaluated across multiple protein systems, MSM-Flow achieves over 200× speedup relative to conventional MD while faithfully reproducing essential structural statistics—including RMSD, radius of gyration, and secondary structure content—outperforming state-of-the-art generative trajectory models.

Technology Category

Application Category

📝 Abstract
Molecular Dynamics (MD) is a powerful computational microscope for probing protein functions. However, the need for fine-grained integration and the long timescales of biomolecular events make MD computationally expensive. To address this, several generative models have been proposed to generate surrogate trajectories at lower cost. Yet, these models typically learn a fixed-lag transition density, causing the training signal to be dominated by frequent but uninformative transitions. We introduce a new class of generative models, MSM Emulators, which instead learn to sample transitions across discrete states defined by an underlying Markov State Model (MSM). We instantiate this class with Markov Space Flow Matching (MarS-FM), whose sampling offers more than two orders of magnitude speedup compared to implicit- or explicit-solvent MD simulations. We benchmark Mars-FM ability to reproduce MD statistics through structural observables such as RMSD, radius of gyration, and secondary structure content. Our evaluation spans protein domains (up to 500 residues) with significant chemical and structural diversity, including unfolding events, and enforces strict sequence dissimilarity between training and test sets to assess generalization. Across all metrics, MarS-FM outperforms existing methods, often by a substantial margin.
Problem

Research questions and friction points this paper is trying to address.

Generating molecular dynamics trajectories at reduced computational cost
Addressing limitations of fixed-lag transition density learning in generative models
Sampling transitions across discrete states defined by Markov State Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Emulates Markov State Models for molecular dynamics
Uses discrete states to sample transitions efficiently
Achieves over 100x speedup in molecular simulations
🔎 Similar Papers
No similar papers found.