🤖 AI Summary
Conventional marginal conformal prediction for drug–target interaction (DTI) prediction overlooks the heterogeneity among drug and protein subpopulations, leading to miscalibrated uncertainty estimates. Method: This paper systematically compares and proposes three clustering-conditional conformal prediction methods, with a core innovation: a non-homogeneous score-driven clustering strategy that jointly leverages feature similarity and nearest-neighbor structure—designed to handle both random and fully unknown drug–protein splits. Contribution/Results: Evaluated on the KIBA dataset, our approach significantly improves confidence interval tightness and subpopulation-wise coverage reliability. Notably, residual-driven clustering maintains robust uncertainty quantification even under sparse or novel DTI scenarios. Empirical results demonstrate that subpopulation-aware conformal prediction achieves both statistical validity and practical interpretability, establishing a new paradigm for trustworthy DTI prediction.
📝 Abstract
Accurate drug-target interaction (DTI) prediction with machine learning models is essential for drug discovery. Such models should also provide a credible representation of their uncertainty, but applying classical marginal conformal prediction (CP) in DTI prediction often overlooks variability across drug and protein subgroups. In this work, we analyze three cluster-conditioned CP methods for DTI prediction, and compare them with marginal and group-conditioned CP. Clusterings are obtained via nonconformity scores, feature similarity, and nearest neighbors, respectively. Experiments on the KIBA dataset using four data-splitting strategies show that nonconformity-based clustering yields the tightest intervals and most reliable subgroup coverage, especially in random and fully unseen drug-protein splits. Group-conditioned CP works well when one entity is familiar, but residual-driven clustering provides robust uncertainty estimates even in sparse or novel scenarios. These results highlight the potential of cluster-based CP for improving DTI prediction under uncertainty.