🤖 AI Summary
Graph Neural Networks (GNNs) applied to Text-Attributed Graphs (TAGs) suffer from semantic drift when aggregating pre-trained language model (PLM) text embeddings in linear space. Method: We propose a manifold-aware geodesic aggregation mechanism that operates on the unit sphere. It introduces a novel semantic drift quantification metric based on local PCA and establishes a quantitative analytical framework linking aggregation operations to semantic manifold structure preservation. Our approach employs log–exp mapping for geodesic aggregation and spherical attention, enabling geometry-aware message passing without modifying PLM parameters. Contribution/Results: By integrating PLM-based text encoding, manifold interpolation, and geodesic propagation, our method significantly mitigates semantic drift across four benchmark TAG datasets and multiple text encoders. It consistently outperforms strong baselines, empirically validating the efficacy of manifold-geometric modeling for TAG representation learning.
📝 Abstract
Graph neural networks (GNNs) on text--attributed graphs (TAGs) typically encode node texts using pretrained language models (PLMs) and propagate these embeddings through linear neighborhood aggregation. However, the representation spaces of modern PLMs are highly non--linear and geometrically structured, where textual embeddings reside on curved semantic manifolds rather than flat Euclidean spaces. Linear aggregation on such manifolds inevitably distorts geometry and causes semantic drift--a phenomenon where aggregated representations deviate from the intrinsic manifold, losing semantic fidelity and expressive power. To quantitatively investigate this problem, this work introduces a local PCA--based metric that measures the degree of semantic drift and provides the first quantitative framework to analyze how different aggregation mechanisms affect manifold structure. Building upon these insights, we propose Geodesic Aggregation, a manifold--aware mechanism that aggregates neighbor information along geodesics via log--exp mappings on the unit sphere, ensuring that representations remain faithful to the semantic manifold during message passing. We further develop GeoGNN, a practical instantiation that integrates spherical attention with manifold interpolation. Extensive experiments across four benchmark datasets and multiple text encoders show that GeoGNN substantially mitigates semantic drift and consistently outperforms strong baselines, establishing the importance of manifold--aware aggregation in text--attributed graph learning.