GraphLake: A Purpose-Built Graph Compute Engine for Lakehouse

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of high startup latency and low query efficiency in graph analytics under lakehouse architectures. The authors propose a lakehouse-native graph processing engine that maps lakehouse tables to vertex and edge types in a property graph and enables efficient querying through GSQL. Key innovations include loading only graph topology to accelerate system initialization, designing a graph-aware caching mechanism, and developing two lakehouse-optimized parallel primitives for graph computation. Experimental evaluation demonstrates that the proposed system significantly outperforms PuppyGraph—the current state-of-the-art—in both startup time and query latency across a range of workloads.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce GraphLake, a purpose-built graph compute engine for Lakehouse. GraphLake is built on top of the commercial graph database TigerGraph. It maps Lakehouse tables to vertex and edge types in a labeled property graph and supports graph analytics over Lakehouse tables using GSQL. To minimize startup time, it loads only the graph topology. Furthermore, it introduces a series of techniques to ensure query efficiency over Lakehouse tables, including a graph-aware caching mechanism and two Lakehouse-optimized parallel primitives. Extensive experiments demonstrate that GraphLake significantly outperforms PuppyGraph, the current state-of-the-art graph compute engine for Lakehouse, by achieving both lower startup and query time.
Problem

Research questions and friction points this paper is trying to address.

Graph Compute Engine
Lakehouse
Graph Analytics
Query Efficiency
Startup Time
Innovation

Methods, ideas, or system contributions that make the work stand out.

GraphLake
Lakehouse
graph compute engine
graph-aware caching
parallel primitives
🔎 Similar Papers
No similar papers found.