🤖 AI Summary
Traditional RAG systems suffer from inefficient multimodal data analysis due to the decoupling of vector retrieval and graph querying. Method: This paper introduces an embedded vector search architecture natively integrated into TigerGraph—a first-of-its-kind design—featuring (i) extended vertex properties supporting native embedding types, (ii) an MPP-coordinated indexing framework unifying vector and graph structural indices, (iii) enhanced GSQL with vector expressions and semantic-level vector-graph hybrid query capabilities, and (iv) a unified graph-vector execution engine. Contribution/Results: Implemented in TigerGraph v4.2 (released December 2024), the solution achieves millisecond-scale hybrid retrieval on billion-edge graphs, outperforming Neo4j, Amazon Neptune, and Milvus in benchmark evaluations. It demonstrates superior scalability and enables deep, joint analysis of structured and unstructured data within a single system.
📝 Abstract
In this paper, we introduce TigerVector, a system that integrates vector search and graph query within TigerGraph, a Massively Parallel Processing (MPP) native graph database. We extend the vertex attribute type with the embedding type. To support fast vector search, we devise an MPP index framework that interoperates efficiently with the graph engine. The graph query language GSQL is enhanced to support vector type expressions and enable query compositions between vector search results and graph query blocks. These advancements elevate the expressive power and analytical capabilities of graph databases, enabling seamless fusion of unstructured and structured data in ways previously unattainable. Through extensive experiments, we demonstrate TigerVector's hybrid search capability, scalability, and superior performance compared to other graph databases (including Neo4j and Amazon Neptune) and a highly optimized specialized vector database (Milvus). TigerVector was integrated into TigerGraph v4.2, the latest release of TigerGraph, in December 2024.