🤖 AI Summary
This work addresses the limited interpretability and inadequate integration of physicochemical information in existing computational models for predicting the effects of protein mutations on stability and solubility. To overcome these challenges, we propose SheafLapNet, a novel framework that, for the first time, incorporates the persistent sheaf Laplacian (PSL) into this task. By synergistically combining topological deep learning, protein Transformer-derived features, and physicochemical descriptors, SheafLapNet enables multiscale, mechanism-driven modeling of mutational effects. Our approach overcomes the insensitivity of conventional topological data analysis to heterogeneous information, substantially enhancing both model interpretability and generalization. SheafLapNet achieves state-of-the-art performance across multiple benchmarks, including S2648, S350, and PON-Sol2.
📝 Abstract
Genetic mutations frequently disrupt protein structure, stability, and solubility, acting as primary drivers for a wide spectrum of diseases. Despite the critical importance of these molecular alterations, existing computational models often lack interpretability, and fail to integrate essential physicochemical interaction. To overcome these limitations, we propose SheafLapNet, a unified predictive framework grounded in the mathematical theory of Topological Deep Learning (TDL) and Persistent Sheaf Laplacian (PSL). Unlike standard Topological Data Analysis (TDA) tools such as persistent homology, which are often insensitive to heterogeneous information, PSL explicitly encodes specific physical and chemical information such as partial charges directly into the topological analysis. SheafLapNet synergizes these sheaf-theoretic invariants with advanced protein transformer features and auxiliary physical descriptors to capture intrinsic molecular interactions in a multiscale and mechanistic manner. To validate our framework, we employ rigorous benchmarks for both regression and classification tasks. For stability prediction, we utilize the comprehensive S2648 and S350 datasets. For solubility prediction, we employ the PON-Sol2 dataset, which provides annotations for increased, decreased, or neutral solubility changes. By integrating these multi-perspective features, SheafLapNet achieves state-of-the-art performance across these diverse benchmarks, demonstrating that sheaf-theoretic modeling significantly enhances both interpretability and generalizability in predicting mutation-induced structural and functional changes.