Geodesic Semantic Search: Learning Local Riemannian Metrics for Citation Graph Retrieval

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the limitations of traditional Euclidean distance–based embedding retrieval in capturing complex semantic paths and local geometric structures within citation graphs. The authors propose a geometry-aware semantic retrieval method that learns a low-rank Riemannian metric at each node to construct local positive semi-definite metric spaces. Coarse-grained retrieval is performed via a geodesic distance–driven, multi-source Dijkstra algorithm, yielding path-interpretable results, followed by fine-grained reranking using Maximal Marginal Relevance (MMR) and path consistency filtering. Innovatively integrating learnable node-level Riemannian metrics into citation retrieval, the approach employs a hierarchical search strategy that substantially reduces computational overhead. Evaluated on a benchmark of 169,000 papers, the method achieves a 23% improvement in Recall@20 over SPECTER+FAISS, cuts computational cost by 75% through hierarchical search, and retains 97% of retrieval quality.

Technology Category

Application Category

📝 Abstract

We present Geodesic Semantic Search (GSS), a retrieval system that learns node-specific Riemannian metrics on citation graphs to enable geometry-aware semantic search. Unlike standard embedding-based retrieval that relies on fixed Euclidean distances, \gss{} learns a low-rank metric tensor $\mL_i \in \R^{d \times r}$ at each node, inducing a local positive semi-definite metric $\mG_i = \mL_i \mL_i^\top + \eps \mI$. This parameterization guarantees valid metrics while keeping the model tractable. Retrieval proceeds via multi-source Dijkstra on the learned geodesic distances, followed by Maximal Marginal Relevance reranking and path coherence filtering. On citation prediction benchmarks with 169K papers, \gss{} achieves 23\% relative improvement in Recall@20 over SPECTER+FAISS baselines while providing interpretable citation paths. Our hierarchical coarse-to-fine search with k-means pooling reduces computational cost by 4$\times$ compared to flat geodesic search while maintaining 97\% retrieval quality. We provide theoretical analysis of when geodesic distances outperform direct similarity, characterize the approximation quality of low-rank metrics, and validate predictions empirically. Code and trained models are available at https://github.com/YCRG-Labs/geodesic-search.

Problem

Research questions and friction points this paper is trying to address.

semantic search

citation graph

geodesic distance

Riemannian metric

information retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geodesic Semantic Search

Riemannian metric learning

citation graph retrieval