🤖 AI Summary
To address three key challenges in molecular docking—chemically implausible ligand poses, poor generalization across unseen targets, and high computational cost—this work introduces the first SE(3)-manifold diffusion-based fragmental generation model. Methodologically, we propose a chemistry-aware rigid fragment decomposition scheme that partitions ligands into topologically stable subunits; these fragments are then reassembled within the protein binding pocket via an equivariant denoising framework incorporating geometric priors on SE(3). Our model achieves state-of-the-art performance on the PoseBusters benchmark, surpassing traditional physics-based methods for the first time: it attains a 79.9% Top-1 success rate (RMSD < 2 Å and PB-valid), outperforming existing deep learning models by 12.7–30.8 percentage points. Moreover, it demonstrates strong zero-shot generalization to unseen protein targets, validating its robustness and scalability. This represents a significant advance in geometry-aware, generative modeling for structure-based drug design.
📝 Abstract
Determining the binding pose of a ligand to a protein, known as molecular docking, is a fundamental task in drug discovery. Generative approaches promise faster, improved, and more diverse pose sampling than physics-based methods, but are often hindered by chemically implausible outputs, poor generalisability, and high computational cost. To address these challenges, we introduce a novel fragmentation scheme, leveraging inductive biases from structural chemistry, to decompose ligands into rigid-body fragments. Building on this decomposition, we present SigmaDock, an SE(3) Riemannian diffusion model that generates poses by learning to reassemble these rigid bodies within the binding pocket. By operating at the level of fragments in SE(3), SigmaDock exploits well-established geometric priors while avoiding overly complex diffusion processes and unstable training dynamics. Experimentally, we show SigmaDock achieves state-of-the-art performance, reaching Top-1 success rates (RMSD<2&PB-valid) above 79.9% on the PoseBusters set, compared to 12.7-30.8% reported by recent deep learning approaches, whilst demonstrating consistent generalisation to unseen proteins. SigmaDock is the first deep learning approach to surpass classical physics-based docking under the PB train-test split, marking a significant leap forward in the reliability and feasibility of deep learning for molecular modelling.