🤖 AI Summary
Existing drug–target interaction (DTI) prediction benchmarks suffer from inconsistent hyperparameter configurations and dataset preprocessing, hindering fair model comparison and progress. To address this, we propose the first standardized benchmark framework grounded in molecular structure modeling—spanning architectural paradigms (GNNs vs. Transformers), representation modalities (sequence-only, 3D conformational, and joint encodings), and six widely used DTI datasets. Our framework enforces uniform hyperparameter tuning and a comprehensive evaluation protocol assessing both predictive performance and computational efficiency (e.g., inference memory and latency). Within this framework, we introduce a lightweight, high-efficiency model ensemble that achieves new state-of-the-art (SOTA) results across all six datasets while substantially reducing memory footprint and computational overhead. We publicly release the full codebase, pretrained models, and evaluation protocols to enhance reproducibility and enable equitable benchmarking in DTI prediction research.
📝 Abstract
The prediction modeling of drug-target interactions is crucial to drug discovery and design, which has seen rapid advancements owing to deep learning technologies. Recently developed methods, such as those based on graph neural networks (GNNs) and Transformers, demonstrate exceptional performance across various datasets by effectively extracting structural information. However, the benchmarking of these novel methods often varies significantly in terms of hyperparameter settings and datasets, which limits algorithmic progress. In view of these, we conduct a comprehensive survey and benchmark for drug-target interaction modeling from a structure perspective, via integrating tens of explicit (i.e., GNN-based) and implicit (i.e., Transformer-based) structure learning algorithms. To this end, we first unify the hyperparameter setting within each class of structure learning methods. Moreover, we conduct a macroscopical comparison between these two classes of encoding strategies as well as the different featurization techniques that inform molecules' chemical and physical properties. We then carry out the microscopical comparison between all the integrated models across the six datasets, via comprehensively benchmarking their effectiveness and efficiency. Remarkably, the summarized insights from the benchmark studies lead to the design of model combos. We demonstrate that our combos can achieve new state-of-the-art performance on various datasets associated with cost-effective memory and computation. Our code is available at hyperlink{https://github.com/justinwjl/GTB-DTI/tree/main}{https://github.com/justinwjl/GTB-DTI/tree/main}.