🤖 AI Summary
To address data scarcity, poor uncertainty calibration, and limited interpretability in cyclic peptide membrane permeability prediction, this work proposes the Monomer-aware Decoupled Global Alignment Kernel (MD-GAK) and its variant incorporating triangular positional priors (PMD-GAK). Methodologically, MD-GAK decouples local residue matching from gap penalties, jointly modeling chemical similarity, sequence alignment, and monomer-specific properties, while PMD-GAK further enhances structural awareness via learned positional priors. Both kernels are embedded within a Gaussian process framework to enable probabilistic predictions—retaining robustness under low-data regimes while substantially reducing uncertainty calibration error. Experiments demonstrate that our approach outperforms existing state-of-the-art models across multiple benchmark metrics, achieving superior predictive accuracy and more reliable uncertainty quantification. All code and experimental protocols are fully reproducible.
📝 Abstract
Cyclic peptides are promising modalities for targeting intracellular sites; however, cell-membrane permeability remains a key bottleneck, exacerbated by limited public data and the need for well-calibrated uncertainty. Instead of relying on data-eager complex deep learning architecture, we propose a monomer-aware decoupled global alignment kernel (MD-GAK), which couples chemically meaningful residue-residue similarity with sequence alignment while decoupling local matches from gap penalties. MD-GAK is a relatively simple kernel. To further demonstrate the robustness of our framework, we also introduce a variant, PMD-GAK, which incorporates a triangular positional prior. As we will show in the experimental section, PMD-GAK can offer additional advantages over MD-GAK, particularly in reducing calibration errors. Since our focus is on uncertainty estimation, we use Gaussian Processes as the predictive model, as both MD-GAK and PMD-GAK can be directly applied within this framework. We demonstrate the effectiveness of our methods through an extensive set of experiments, comparing our fully reproducible approach against state-of-the-art models, and show that it outperforms them across all metrics.