GraphLake: A Purpose-Built Graph Compute Engine for Lakehouse

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenges of high startup latency and low query efficiency in graph analytics under lakehouse architectures. The authors propose a lakehouse-native graph processing engine that maps lakehouse tables to vertex and edge types in a property graph and enables efficient querying through GSQL. Key innovations include loading only graph topology to accelerate system initialization, designing a graph-aware caching mechanism, and developing two lakehouse-optimized parallel primitives for graph computation. Experimental evaluation demonstrates that the proposed system significantly outperforms PuppyGraph—the current state-of-the-art—in both startup time and query latency across a range of workloads.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce GraphLake, a purpose-built graph compute engine for Lakehouse. GraphLake is built on top of the commercial graph database TigerGraph. It maps Lakehouse tables to vertex and edge types in a labeled property graph and supports graph analytics over Lakehouse tables using GSQL. To minimize startup time, it loads only the graph topology. Furthermore, it introduces a series of techniques to ensure query efficiency over Lakehouse tables, including a graph-aware caching mechanism and two Lakehouse-optimized parallel primitives. Extensive experiments demonstrate that GraphLake significantly outperforms PuppyGraph, the current state-of-the-art graph compute engine for Lakehouse, by achieving both lower startup and query time.

Problem

Research questions and friction points this paper is trying to address.

Graph Compute Engine

Lakehouse

Graph Analytics

Query Efficiency

Startup Time

Innovation

Methods, ideas, or system contributions that make the work stand out.

GraphLake

Lakehouse

graph compute engine