PG-HIVE: Hybrid Incremental Schema Discovery for Property Graphs

📅 2025-11-30

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

The schema-less nature of property graphs severely hinders data integration, querying, and visualization, necessitating automatic discovery of node/edge types and inference of attribute types and constraints—without explicit schema annotations. To address this, we propose PG-HIVE, the first framework introducing an incremental hybrid schema discovery mechanism that jointly leverages locality-sensitive hashing (LSH) and attribute-label co-clustering. This enables fine-grained, scalable, and dynamic schema evolution while avoiding full recomputation. PG-HIVE operates in a fully unsupervised manner, achieving both high accuracy and efficiency. Extensive experiments demonstrate that PG-HIVE improves node-type and edge-type identification accuracy by 65% and 40%, respectively, and accelerates execution by up to 1.95× compared to state-of-the-art methods—establishing new performance benchmarks in schema-agnostic property graph analysis.

Technology Category

Application Category

📝 Abstract

Property graphs have rapidly become the de facto standard for representing and managing complex, interconnected data, powering applications across domains from knowledge graphs to social networks. Despite the advantages, their schema-free nature poses major challenges for integration, exploration, visualization, and efficient querying. To bridge this gap, we present PG-HIVE, a novel framework for automatic schema discovery in property graphs. PG-HIVE goes beyond existing approaches by uncovering latent node and edge types, inferring property datatypes, constraints, and cardinalities, and doing so even in the absence of explicit labeling information. Leveraging a unique combination of Locality-Sensitive Hashing with property- and label-based clustering, PG-HIVE identifies structural similarities at scale. Moreover, it introduces incremental schema discovery, eliminating costly recomputation as new data arrives. Through extensive experimentation, we demonstrate that PG-HIVE consistently outperforms state-of-the-art solutions, in both accuracy (by up to 65% for nodes and 40% for edges), and efficiency (up to 1.95x faster execution), unlocking the full potential of schema-aware property graph management.

Problem

Research questions and friction points this paper is trying to address.

Automatically discovers latent node and edge types in property graphs

Infers property datatypes, constraints, and cardinalities without explicit labels

Enables incremental schema discovery to avoid costly recomputation on new data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid incremental schema discovery for property graphs

Uses Locality-Sensitive Hashing with property-label clustering

Infers node types, edge types, datatypes, constraints, cardinalities

🔎 Similar Papers

Schema-Based Query Optimisation for Graph Databases