UniSite: The First Cross-Structure Dataset and Learning Framework for End-to-End Ligand Binding Site Detection

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

To address three key challenges in protein–ligand binding site detection—structural heterogeneity across complexes, fragmented processing pipelines, and inaccurate evaluation metrics—this paper introduces UniSite-DS, the first UniProt-centric multi-binding-site dataset, scaling 4.81× larger in multi-site instances and 2.08× larger in total instances than prior largest datasets. We propose an end-to-end binding site detection framework that unifies segmentation and clustering via set-based prediction modeling and bipartite matching using the Hungarian algorithm. Furthermore, we adopt IoU-weighted mean average precision (mAP) as a more robust evaluation metric. Experiments demonstrate that UniSite consistently outperforms state-of-the-art methods across multiple benchmarks, significantly mitigating statistical bias while improving detection consistency and evaluation reliability.

Technology Category

Application Category

📝 Abstract

The detection of ligand binding sites for proteins is a fundamental step in Structure-Based Drug Design. Despite notable advances in recent years, existing methods, datasets, and evaluation metrics are confronted with several key challenges: (1) current datasets and methods are centered on individual protein-ligand complexes and neglect that diverse binding sites may exist across multiple complexes of the same protein, introducing significant statistical bias; (2) ligand binding site detection is typically modeled as a discontinuous workflow, employing binary segmentation and subsequent clustering algorithms; (3) traditional evaluation metrics do not adequately reflect the actual performance of different binding site prediction methods. To address these issues, we first introduce UniSite-DS, the first UniProt (Unique Protein)-centric ligand binding site dataset, which contains 4.81 times more multi-site data and 2.08 times more overall data compared to the previously most widely used datasets. We then propose UniSite, the first end-to-end ligand binding site detection framework supervised by set prediction loss with bijective matching. In addition, we introduce Average Precision based on Intersection over Union (IoU) as a more accurate evaluation metric for ligand binding site prediction. Extensive experiments on UniSite-DS and several representative benchmark datasets demonstrate that IoU-based Average Precision provides a more accurate reflection of prediction quality, and that UniSite outperforms current state-of-the-art methods in ligand binding site detection. The dataset and codes will be made publicly available at https://github.com/quanlin-wu/unisite.

Problem

Research questions and friction points this paper is trying to address.

Detects diverse binding sites across multiple protein complexes

Replaces discontinuous workflow with end-to-end detection framework

Introduces improved evaluation metric for binding site prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

First UniProt-centric ligand binding site dataset

End-to-end detection framework with set prediction

IoU-based Average Precision for accurate evaluation

🔎 Similar Papers

FusionDTI: Fine-grained Binding Discovery with Token-level Fusion for Drug-Target Interaction