Product Manifold Representations for Learning on Biological Pathways

📅 2024-01-27
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Biological pathway graphs exhibit high topological complexity and suffer from severe distortion when embedded in Euclidean space. To address this, we propose MC-GCN—the first non-Euclidean graph neural network that learns pathway node embeddings on a product manifold endowed with mixed curvature. Methodologically, MC-GCN integrates multi-curvature geometric modeling with product manifold optimization, designs a curvature-aware graph convolution tailored for highly distorted structures, and employs a supervised edge-prediction framework. Its key contribution lies in pioneering the incorporation of mixed-curvature geometry and product manifold representation into pathway embedding learning, effectively mitigating embedding distortion caused by global curvature inconsistency. Experiments demonstrate that MC-GCN significantly reduces embedding distortion and achieves substantial improvements in accuracy on in-distribution protein–protein interaction prediction. The source code and pathway analysis toolkit are publicly available.

Technology Category

Application Category

📝 Abstract
Machine learning models that embed graphs in non-Euclidean spaces have shown substantial benefits in a variety of contexts, but their application has not been studied extensively in the biological domain, particularly with respect to biological pathway graphs. Such graphs exhibit a variety of complex network structures, presenting challenges to existing embedding approaches. Learning high-quality embeddings for biological pathway graphs is important for researchers looking to understand the underpinnings of disease and train high-quality predictive models on these networks. In this work, we investigate the effects of embedding pathway graphs in non-Euclidean mixed-curvature spaces and compare against traditional Euclidean graph representation learning models. We then train a supervised model using the learned node embeddings to predict missing protein-protein interactions in pathway graphs. We find large reductions in distortion and boosts on in-distribution edge prediction performance as a result of using mixed-curvature embeddings and their corresponding graph neural network models. However, we find that mixed-curvature representations underperform existing baselines on out-of-distribution edge prediction performance suggesting that these representations may overfit to the training graph topology. We provide our Mixed-Curvature Product Graph Convolutional Network code at https://github.com/mcneela/Mixed-Curvature-GCN and our pathway analysis code at https://github.com/mcneela/Mixed-Curvature-Pathways.
Problem

Research questions and friction points this paper is trying to address.

Improving biological pathway graph embeddings using non-Euclidean spaces.
Predicting missing protein-protein interactions in pathway graphs.
Evaluating mixed-curvature embeddings' performance on edge prediction tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-Euclidean mixed-curvature embeddings
Supervised model for protein interactions
Mixed-Curvature Product Graph Convolutional Network
🔎 Similar Papers
No similar papers found.
D
Daniel McNeela
Department of Computer Sciences, University of Wisconsin-Madison; Morgridge Institute for Research; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
Frederic Sala
Frederic Sala
Assistant Professor, University of Wisconsin
Data-centric AIMachine learningInformation theory
Anthony Gitter
Anthony Gitter
Associate Professor, University of Wisconsin-Madison; Morgridge Institute for Research
Computational biologyBioinformatics