🤖 AI Summary
To address three key challenges in protein–ligand binding site detection—structural heterogeneity across complexes, fragmented processing pipelines, and inaccurate evaluation metrics—this paper introduces UniSite-DS, the first UniProt-centric multi-binding-site dataset, scaling 4.81× larger in multi-site instances and 2.08× larger in total instances than prior largest datasets. We propose an end-to-end binding site detection framework that unifies segmentation and clustering via set-based prediction modeling and bipartite matching using the Hungarian algorithm. Furthermore, we adopt IoU-weighted mean average precision (mAP) as a more robust evaluation metric. Experiments demonstrate that UniSite consistently outperforms state-of-the-art methods across multiple benchmarks, significantly mitigating statistical bias while improving detection consistency and evaluation reliability.
📝 Abstract
The detection of ligand binding sites for proteins is a fundamental step in Structure-Based Drug Design. Despite notable advances in recent years, existing methods, datasets, and evaluation metrics are confronted with several key challenges: (1) current datasets and methods are centered on individual protein-ligand complexes and neglect that diverse binding sites may exist across multiple complexes of the same protein, introducing significant statistical bias; (2) ligand binding site detection is typically modeled as a discontinuous workflow, employing binary segmentation and subsequent clustering algorithms; (3) traditional evaluation metrics do not adequately reflect the actual performance of different binding site prediction methods. To address these issues, we first introduce UniSite-DS, the first UniProt (Unique Protein)-centric ligand binding site dataset, which contains 4.81 times more multi-site data and 2.08 times more overall data compared to the previously most widely used datasets. We then propose UniSite, the first end-to-end ligand binding site detection framework supervised by set prediction loss with bijective matching. In addition, we introduce Average Precision based on Intersection over Union (IoU) as a more accurate evaluation metric for ligand binding site prediction. Extensive experiments on UniSite-DS and several representative benchmark datasets demonstrate that IoU-based Average Precision provides a more accurate reflection of prediction quality, and that UniSite outperforms current state-of-the-art methods in ligand binding site detection. The dataset and codes will be made publicly available at https://github.com/quanlin-wu/unisite.