Matcha: Multi-Stage Riemannian Flow Matching for Accurate and Physically Valid Molecular Docking

๐Ÿ“… 2025-10-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Proteinโ€“ligand binding pose prediction is critical for structure-based drug design, yet existing methods struggle to simultaneously achieve high accuracy, computational efficiency, and physical plausibility. To address this, we propose a multi-stage Riemannian flow matching framework that optimizes ligand conformations sequentially in the geometric spaces โ„ยณ, SO(3), and SO(2). Our method models the molecular conformational manifold using Lie group theory and integrates a learnable scoring model with an unsupervised physics-based validity filter. A deep neural network jointly performs generative pose prediction and scoring. On the Astex and PDBbind benchmarks, our approach significantly improves docking success rates (Top-1 RMSD < 2 ร…) and physical realism while achieving 25ร— faster inference than large co-folding models. The code and pretrained models are publicly available.

Technology Category

Application Category

๐Ÿ“ Abstract
Accurate prediction of protein-ligand binding poses is crucial for structure-based drug design, yet existing methods struggle to balance speed, accuracy, and physical plausibility. We introduce Matcha, a novel molecular docking pipeline that combines multi-stage flow matching with learned scoring and physical validity filtering. Our approach consists of three sequential stages applied consecutively to refine docking predictions, each implemented as a flow matching model operating on appropriate geometric spaces ($mathbb{R}^3$, $mathrm{SO}(3)$, and $mathrm{SO}(2)$). We enhance the prediction quality through a dedicated scoring model and apply unsupervised physical validity filters to eliminate unrealistic poses. Compared to various approaches, Matcha demonstrates superior performance on Astex and PDBbind test sets in terms of docking success rate and physical plausibility. Moreover, our method works approximately 25 times faster than modern large-scale co-folding models. The model weights and inference code to reproduce our results are available at https://github.com/LigandPro/Matcha.
Problem

Research questions and friction points this paper is trying to address.

Accurately predicting protein-ligand binding poses
Balancing speed, accuracy and physical plausibility
Refining docking predictions through multi-stage geometric modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage flow matching on geometric spaces
Learned scoring model enhances prediction quality
Unsupervised physical validity filters eliminate unrealistic poses
๐Ÿ”Ž Similar Papers
No similar papers found.
D
Daria Frolova
Ligand Pro, Moscow, Russia; Skolkovo Institute of Science and Technology, Artificial Intelligence Center, Moscow, Russia
Talgat Daulbaev
Talgat Daulbaev
Unknown affiliation
Machine LearningDeep LearningNumerical Methods
E
Egor Sevryugov
Ligand Pro, Moscow, Russia; Skolkovo Institute of Science and Technology, Artificial Intelligence Center, Moscow, Russia
S
Sergei A. Nikolenko
Ligand Pro, Moscow, Russia
D
Dmitry N. Ivankov
Skolkovo Institute of Science and Technology, Center for Molecular and Cellular Biology, Moscow, Russia; Ligand Pro, Moscow, Russia
Ivan Oseledets
Ivan Oseledets
AIRI; Skolkovo Institute of Science and Technology
Numerical mathematicstensorsdeep learningmachine learningmatrix analysis
M
Marina A. Pak
Ligand Pro, Moscow, Russia