GeoGNN: Quantifying and Mitigating Semantic Drift in Text-Attributed Graphs

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Graph Neural Networks (GNNs) applied to Text-Attributed Graphs (TAGs) suffer from semantic drift when aggregating pre-trained language model (PLM) text embeddings in linear space. Method: We propose a manifold-aware geodesic aggregation mechanism that operates on the unit sphere. It introduces a novel semantic drift quantification metric based on local PCA and establishes a quantitative analytical framework linking aggregation operations to semantic manifold structure preservation. Our approach employs log–exp mapping for geodesic aggregation and spherical attention, enabling geometry-aware message passing without modifying PLM parameters. Contribution/Results: By integrating PLM-based text encoding, manifold interpolation, and geodesic propagation, our method significantly mitigates semantic drift across four benchmark TAG datasets and multiple text encoders. It consistently outperforms strong baselines, empirically validating the efficacy of manifold-geometric modeling for TAG representation learning.

Technology Category

Application Category

📝 Abstract
Graph neural networks (GNNs) on text--attributed graphs (TAGs) typically encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation. However, the representation spaces of modern PLMs are highly non--linear and geometrically structured, where textual embeddings reside on curved semantic manifolds rather than flat Euclidean spaces. Linear aggregation on such manifolds inevitably distorts geometry and causes semantic drift--a phenomenon where aggregated representations deviate from the intrinsic manifold, losing semantic fidelity and expressive power. To quantitatively investigate this problem, this work introduces a local PCA--based metric that measures the degree of semantic drift and provides the first quantitative framework to analyze how different aggregation mechanisms affect manifold structure. Building upon these insights, we propose Geodesic Aggregation, a manifold--aware mechanism that aggregates neighbor information along geodesics via log--exp mappings on the unit sphere, ensuring that representations remain faithful to the semantic manifold during message passing. We further develop GeoGNN, a practical instantiation that integrates spherical attention with manifold interpolation. Extensive experiments across four benchmark datasets and multiple text encoders show that GeoGNN substantially mitigates semantic drift and consistently outperforms strong baselines, establishing the importance of manifold--aware aggregation in text--attributed graph learning.
Problem

Research questions and friction points this paper is trying to address.

Quantifying semantic drift in graph neural networks on text-attributed graphs
Mitigating geometric distortion from linear aggregation on semantic manifolds
Developing manifold-aware aggregation to preserve semantic fidelity during message passing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces local PCA metric to quantify semantic drift
Proposes Geodesic Aggregation using log-exp mappings
Develops GeoGNN with spherical attention mechanism
🔎 Similar Papers
No similar papers found.
Liangwei Yang
Liangwei Yang
Salesforce Research
Network ScienceRecommender SystemEfficient Modeling
J
Jing Ma
Independent Researcher, Palo Alto, CA, USA
J
Jianguo Zhang
Salesforce AI Research, Palo Alto, CA, USA
Z
Zhiwei Liu
Salesforce AI Research, Palo Alto, CA, USA
Jielin Qiu
Jielin Qiu
Research Scientist, Salesforce AI Research
Machine LearningMultimodal Learning
Shirley Kokane
Shirley Kokane
Salesforce Research
S
Shiyu Wang
Salesforce AI Research, Palo Alto, CA, USA
H
Haolin Chen
Salesforce AI Research, Palo Alto, CA, USA
R
Rithesh Murthy
Salesforce AI Research, Palo Alto, CA, USA
M
Ming Zhu
Salesforce AI Research, Palo Alto, CA, USA
H
Huan Wang
Salesforce AI Research, Palo Alto, CA, USA
W
Weiran Yao
Salesforce AI Research, Palo Alto, CA, USA
Caiming Xiong
Caiming Xiong
Salesforce Research
Machine LearningNLPComputer VisionMultimediaData Mining
Shelby Heinecke
Shelby Heinecke
Salesforce Research
Artificial IntelligenceAI AgentsLLM AgentsMulti-Agent SystemsRecommendation Systems