🤖 AI Summary
Enzyme turnover number (k<sub>cat</sub>) is a fundamental kinetic parameter quantifying catalytic efficiency, yet its experimental determination remains costly and low-throughput, resulting in severe scarcity of large-scale k<sub>cat</sub> data. To address this, we propose a multimodal, interpretable prediction framework that—uniquely—integrates a pretrained protein language model, a graph neural network for substrate molecular structure representation, and an environmental feature encoder. Crucially, we introduce a hybrid symbolic regression and Kolmogorov–Arnold network to explicitly learn analytically tractable physicochemical laws governing k<sub>cat</sub>. Evaluated on multiple benchmark datasets, our method significantly outperforms conventional QSAR approaches and state-of-the-art deep learning models, achieving a 23.6% average reduction in MAE and an R² of 0.89. The framework thus delivers both high predictive accuracy and strong mechanistic interpretability, providing a reliable, efficient computational tool for enzyme engineering and biocatalyst design.
📝 Abstract
The enzyme turnover rate is a fundamental parameter in enzyme kinetics, reflecting the catalytic efficiency of enzymes. However, enzyme turnover rates remain scarce across most organisms due to the high cost and complexity of experimental measurements. To address this gap, we propose a multimodal framework for predicting the enzyme turnover rate by integrating enzyme sequences, substrate structures, and environmental factors. Our model combines a pre-trained language model and a convolutional neural network to extract features from protein sequences, while a graph neural network captures informative representations from substrate molecules. An attention mechanism is incorporated to enhance interactions between enzyme and substrate representations. Furthermore, we leverage symbolic regression via Kolmogorov-Arnold Networks to explicitly learn mathematical formulas that govern the enzyme turnover rate, enabling interpretable and accurate predictions. Extensive experiments demonstrate that our framework outperforms both traditional and state-of-the-art deep learning approaches. This work provides a robust tool for studying enzyme kinetics and holds promise for applications in enzyme engineering, biotechnology, and industrial biocatalysis.