Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Protein–ligand binding pose prediction is critical for structure-based drug design, yet existing methods struggle to simultaneously achieve high accuracy, computational efficiency, and physical plausibility. To address this, we propose a multi-stage Riemannian flow matching framework that optimizes ligand conformations sequentially in the geometric spaces ℝ³, SO(3), and SO(2). Our method models the molecular conformational manifold using Lie group theory and integrates a learnable scoring model with an unsupervised physics-based validity filter. A deep neural network jointly performs generative pose prediction and scoring. On the Astex and PDBbind benchmarks, our approach significantly improves docking success rates (Top-1 RMSD < 2 Å) and physical realism while achieving 25× faster inference than large co-folding models. The code and pretrained models are publicly available.

Technology Category

Application Category

📝 Abstract

Accurate prediction of protein-ligand binding poses is crucial for structure-based drug design, yet existing methods struggle to balance speed, accuracy, and physical plausibility. We introduce Matcha, a novel molecular docking pipeline that combines multi-stage flow matching with learned scoring and physical validity filtering. Our approach consists of three sequential stages applied consecutively to refine docking predictions, each implemented as a flow matching model operating on appropriate geometric spaces ($mathbb{R}^3$, $mathrm{SO}(3)$, and $mathrm{SO}(2)$). We enhance the prediction quality through a dedicated scoring model and apply unsupervised physical validity filters to eliminate unrealistic poses. Compared to various approaches, Matcha demonstrates superior performance on Astex and PDBbind test sets in terms of docking success rate and physical plausibility. Moreover, our method works approximately 25 times faster than modern large-scale co-folding models. The model weights and inference code to reproduce our results are available at https://github.com/LigandPro/Matcha.

Problem

Research questions and friction points this paper is trying to address.

Accurately predicting protein-ligand binding poses

Balancing speed, accuracy and physical plausibility

Refining docking predictions through multi-stage geometric modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage flow matching on geometric spaces

Learned scoring model enhances prediction quality

Unsupervised physical validity filters eliminate unrealistic poses

🔎 Similar Papers

FABind+: Enhancing Molecular Docking through Improved Pocket Prediction and Pose Generation