Aster: Enhancing LSM-structures for Scalable Graph Database

📅 2025-01-11

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address storage redundancy, high update overhead, and elevated query latency in large-scale dynamic graphs with frequent updates and complex queries, this paper proposes Aster, a graph-aware LSM-tree storage architecture. Its core contributions are: (1) a Poly-LSM hybrid storage engine incorporating graph-structural modeling; (2) an adaptive I/O optimization mechanism for edge insertions and deletions; (3) a degree-distribution–skew–aware key-value encoding scheme; and (4) a lightweight Gremlin query engine co-optimized with multi-tier compaction strategies. Evaluated on a billion-edge Twitter graph, Aster achieves 17× higher throughput than the best baseline, maintains millisecond-scale query latency under heavy update workloads, and demonstrates linear scalability—significantly outperforming mainstream systems including Neo4j and JanusGraph.

Technology Category

Application Category

📝 Abstract

There is a proliferation of applications requiring the management of large-scale, evolving graphs under workloads with intensive graph updates and lookups. Driven by this challenge, we introduce Poly-LSM, a high-performance key-value storage engine for graphs with the following novel techniques: (1) Poly-LSM is embedded with a new design of graph-oriented LSM-tree structure that features a hybrid storage model for concisely and effectively storing graph data. (2) Poly-LSM utilizes an adaptive mechanism to handle edge insertions and deletions on graphs with optimized I/O efficiency. (3) Poly-LSM exploits the skewness of graph data to encode the key-value entries. Building upon this foundation, we further implement Aster, a robust and versatile graph database that supports Gremlin query language facilitating various graph applications. In our experiments, we compared Aster against several mainstream real-world graph databases. The results demonstrate that Aster outperforms all baseline graph databases, especially on large-scale graphs. Notably, on the billion-scale Twitter graph dataset, Aster achieves up to 17x throughput improvement compared to the best-performing baseline graph system.

Problem

Research questions and friction points this paper is trying to address.

Large-scale Graph Data

Update Efficiency

Query Performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Poly-LSM storage system

graph data optimization

Aster graph database

🔎 Similar Papers

No similar papers found.