Datalog with First-Class Facts

📅 2024-11-01

🏛️ Proceedings of the VLDB Endowment

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Datalog natively supports only flat atomic facts, making it inefficient for modeling and reasoning over recursive hierarchical structures (e.g., ASTs, derivation trees). Existing extensions—such as Datalog± and Soufflé—are either hampered by high-order quantification (compromising implementation simplicity and decidability) or constrained by algebraic data types lacking native indexing and rule-triggering mechanisms. This paper introduces *first-order facts*: a novel paradigm elevating structured facts to first-class entities, uniquely identified by Skolem terms. This preserves Datalog’s decidability while enabling native structural modeling, efficient indexing, and rule evaluation. Technically, the approach integrates Skolemization-based representation, MPI-optimized parallel communication, and custom indexing strategies. Experiments on diverse benchmarks demonstrate order-of-magnitude throughput improvements over state-of-the-art systems—including Nemo, Vlog, RDFox, and Soufflé—and scalable performance up to thousands of threads.

Technology Category

Application Category

📝 Abstract

Datalog is a popular logic programming language for deductive reasoning tasks in a wide array of applications, including business analytics, program analysis, and ontological reasoning. However, Datalog's restriction to flat facts over atomic constants leads to challenges in working with tree-structured data, such as derivation trees or abstract syntax trees. To ameliorate Datalog's restrictions, popular extensions of Datalog support features such as existential quantification in rule heads (Datalog*, Datalog ∃ ) or algebraic data types (Soufflé). Unfortunately, these are imperfect solutions for reasoning over structured and recursive data types, with general existentials leading to complex implementations requiring unification, and ADTs unable to trigger rule evaluation and failing to support efficient indexing. We present D L ∃! , a Datalog with first-class facts, wherein every fact is identified with a Skolem term unique to the fact. We show that this restriction offers an attractive price point for Datalogbased reasoning over tree-shaped data, demonstrating its application to databases, artificial intelligence, and programming languages. We implemented D L ∃! as a system Slog, which leverages the uniqueness restriction of D L ∃! to enable a communication-avoiding, massively-parallel implementation built on MPI. We show that Slog outperforms leading systems (Nemo, Vlog, RDFox, and Soufflé) on a variety of benchmarks, with the potential to scale to thousands of threads.

Problem

Research questions and friction points this paper is trying to address.

Datalog struggles with tree-structured data handling

Existing extensions complicate reasoning over recursive data types

DL$^{exists!}$ aims to improve efficiency in parallel reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

DL$^{exists!}$ introduces first-class facts with Skolem terms

Slog system enables massively-parallel implementation using MPI

DL$^{exists!}$ outperforms leading systems in various benchmarks

🔎 Similar Papers

No similar papers found.