🤖 AI Summary
Existing protein representation methods struggle to model their intrinsic hierarchical structure (residues → secondary structures → tertiary conformations). To address this, we propose Topotein—the first topology-aware deep learning framework for proteins—which explicitly encodes multi-scale geometric and topological features via Protein Combination Complexes (PCCs). We further design the Topology-Complete Perception Network (TCPNet), which integrates SE(3)-equivariant message passing with combinatorial algebraic topology to enable geometry-aware, cross-hierarchical feature aggregation. Topotein unifies local geometric invariance with global hierarchical topological relationships. Evaluated on four protein structure understanding tasks—including protein fold classification—Topotein consistently outperforms state-of-the-art geometric graph neural networks, achieving particularly substantial gains in fold pattern recognition. These results empirically validate the critical role of hierarchical topological representations in advancing protein structural understanding.
📝 Abstract
Protein representation learning (PRL) is crucial for understanding structure-function relationships, yet current sequence- and graph-based methods fail to capture the hierarchical organization inherent in protein structures. We introduce Topotein, a comprehensive framework that applies topological deep learning to PRL through the novel Protein Combinatorial Complex (PCC) and Topology-Complete Perceptron Network (TCPNet). Our PCC represents proteins at multiple hierarchical levels -- from residues to secondary structures to complete proteins -- while preserving geometric information at each level. TCPNet employs SE(3)-equivariant message passing across these hierarchical structures, enabling more effective capture of multi-scale structural patterns. Through extensive experiments on four PRL tasks, TCPNet consistently outperforms state-of-the-art geometric graph neural networks. Our approach demonstrates particular strength in tasks such as fold classification which require understanding of secondary structure arrangements, validating the importance of hierarchical topological features for protein analysis.