π€ AI Summary
Current glycan modeling predominantly relies on monosaccharide-level graphs, neglecting atomic-level details critical to glycan function and physicochemical properties. To address this limitation, we propose GlycanAAβthe first heterogeneous graph neural network for full-atom glycan modeling. GlycanAA unifies monosaccharide units and individual atoms as heterogeneous graph nodes and introduces a hierarchical message-passing mechanism that jointly captures local atomic interactions and global glycosidic backbone topology. Furthermore, we design a multi-scale masked prediction pretraining strategy that concurrently models dependencies at the atom, bond, and monosaccharide levels. Extensive experiments demonstrate that GlycanAA significantly outperforms state-of-the-art methods across multiple glycan property prediction tasks, including aqueous solubility and protein-binding affinity. An enhanced variant, PreGlycanAA, achieves further performance gains. The code and datasets are publicly available.
π Abstract
Understanding the various properties of glycans with machine learning has shown some preliminary promise. However, previous methods mainly focused on modeling the backbone structure of glycans as graphs of monosaccharides (i.e., sugar units), while they neglected the atomic structures underlying each monosaccharide, which are actually important indicators of glycan properties. We fill this blank by introducing the GlycanAA model for All-Atom-wise Glycan modeling. GlycanAA models a glycan as a heterogeneous graph with monosaccharide nodes representing its global backbone structure and atom nodes representing its local atomic-level structures. Based on such a graph, GlycanAA performs hierarchical message passing to capture from local atomic-level interactions to global monosaccharide-level interactions. To further enhance model capability, we pre-train GlycanAA on a high-quality unlabeled glycan dataset, deriving the PreGlycanAA model. We design a multi-scale mask prediction algorithm to endow the model about different levels of dependencies in a glycan. Extensive benchmark results show the superiority of GlycanAA over existing glycan encoders and verify the further improvements achieved by PreGlycanAA. We maintain all resources at https://github.com/kasawa1234/GlycanAA