Datalog with First-Class Facts

📅 2024-11-01
🏛️ Proceedings of the VLDB Endowment
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Datalog natively supports only flat atomic facts, making it inefficient for modeling and reasoning over recursive hierarchical structures (e.g., ASTs, derivation trees). Existing extensions—such as Datalog± and Soufflé—are either hampered by high-order quantification (compromising implementation simplicity and decidability) or constrained by algebraic data types lacking native indexing and rule-triggering mechanisms. This paper introduces *first-order facts*: a novel paradigm elevating structured facts to first-class entities, uniquely identified by Skolem terms. This preserves Datalog’s decidability while enabling native structural modeling, efficient indexing, and rule evaluation. Technically, the approach integrates Skolemization-based representation, MPI-optimized parallel communication, and custom indexing strategies. Experiments on diverse benchmarks demonstrate order-of-magnitude throughput improvements over state-of-the-art systems—including Nemo, Vlog, RDFox, and Soufflé—and scalable performance up to thousands of threads.

Technology Category

Application Category

📝 Abstract
Datalog is a popular logic programming language for deductive reasoning tasks in a wide array of applications, including business analytics, program analysis, and ontological reasoning. However, Datalog's restriction to flat facts over atomic constants leads to challenges in working with tree-structured data, such as derivation trees or abstract syntax trees. To ameliorate Datalog's restrictions, popular extensions of Datalog support features such as existential quantification in rule heads (Datalog*, Datalog ∃ ) or algebraic data types (Soufflé). Unfortunately, these are imperfect solutions for reasoning over structured and recursive data types, with general existentials leading to complex implementations requiring unification, and ADTs unable to trigger rule evaluation and failing to support efficient indexing. We present D L ∃! , a Datalog with first-class facts, wherein every fact is identified with a Skolem term unique to the fact. We show that this restriction offers an attractive price point for Datalogbased reasoning over tree-shaped data, demonstrating its application to databases, artificial intelligence, and programming languages. We implemented D L ∃! as a system Slog, which leverages the uniqueness restriction of D L ∃! to enable a communication-avoiding, massively-parallel implementation built on MPI. We show that Slog outperforms leading systems (Nemo, Vlog, RDFox, and Soufflé) on a variety of benchmarks, with the potential to scale to thousands of threads.
Problem

Research questions and friction points this paper is trying to address.

Datalog struggles with tree-structured data handling
Existing extensions complicate reasoning over recursive data types
DL$^{exists!}$ aims to improve efficiency in parallel reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

DL$^{exists!}$ introduces first-class facts with Skolem terms
Slog system enables massively-parallel implementation using MPI
DL$^{exists!}$ outperforms leading systems in various benchmarks
🔎 Similar Papers
No similar papers found.
Thomas Gilray
Thomas Gilray
Washington State University
Static AnalysisLanguage DesignAutomated ReasoningCompilers
A
Arash Sahebolamri
Syracuse University
Y
Yihao Sun
Syracuse University
S
Sowmith Kunapaneni
Washington State University
Sidharth Kumar
Sidharth Kumar
Associate Professor, University of Illinois at Chicago
HPCParallel I/OVisualization
K
Kristopher K. Micinski
Syracuse University