🤖 AI Summary
This work proposes the first atom-level prediction framework that explicitly models cross-isoform metabolic relationships among cytochrome P450 (CYP) enzymes, addressing the limitation of existing methods that treat each isoform independently and thus fail to capture shared metabolic patterns. By integrating a shared graph encoder, molecule-conditioned atomic representations, and a cross-attention mechanism, the model jointly learns inter-isoform dependencies. To rigorously evaluate performance under severe class imbalance, the study employs stringent metrics such as the Matthews correlation coefficient. Evaluated on two atomically annotated benchmarks, the method achieves state-of-the-art top-k performance across multiple CYP isoforms and substantially improves binary classification reliability, thereby demonstrating the effectiveness and necessity of cross-isoform modeling in CYP metabolism site prediction.
📝 Abstract
Identifying metabolic sites where cytochrome P450 enzymes metabolize small-molecule drugs is essential for drug discovery. Although existing computational approaches have been proposed for site-of-metabolism prediction, they typically ignore cytochrome P450 isoform identity or model isoforms independently, thereby failing to fully capture inherent cross-isoform metabolic patterns. In addition, prior evaluations often rely on top-k metrics, where false positive atoms may be included among the top predictions, underscoring the need for complementary metrics that more directly assess binary atom-level discrimination under severe class imbalance. We propose ATTNSOM, an atom-level site-of-metabolism prediction framework that integrates intrinsic molecular reactivity with cross-isoform relationships. The model combines a shared graph encoder, molecule-conditioned atom representations, and a cross-attention mechanism to capture correlated metabolic patterns across cytochrome P450 isoforms. The model is evaluated on two benchmark datasets annotated with site-of-metabolism labels at atom resolution. Across these benchmarks, the model achieves consistently strong top-k performance across multiple cytochrome P450 isoforms. Relative to ablated variants, the model yields higher Matthews correlation coefficient, indicating improved discrimination of true metabolic sites. These results support the importance of explicitly modeling cross-isoform relationships for site-of-metabolism prediction. The code and datasets are available at https://github.com/dmis-lab/ATTNSOM.