🤖 AI Summary
Datalog natively supports only flat atomic facts, making it inefficient for modeling and reasoning over recursive hierarchical structures (e.g., ASTs, derivation trees). Existing extensions—such as Datalog± and Soufflé—are either hampered by high-order quantification (compromising implementation simplicity and decidability) or constrained by algebraic data types lacking native indexing and rule-triggering mechanisms. This paper introduces *first-order facts*: a novel paradigm elevating structured facts to first-class entities, uniquely identified by Skolem terms. This preserves Datalog’s decidability while enabling native structural modeling, efficient indexing, and rule evaluation. Technically, the approach integrates Skolemization-based representation, MPI-optimized parallel communication, and custom indexing strategies. Experiments on diverse benchmarks demonstrate order-of-magnitude throughput improvements over state-of-the-art systems—including Nemo, Vlog, RDFox, and Soufflé—and scalable performance up to thousands of threads.
📝 Abstract
Datalog is a popular logic programming language for deductive reasoning tasks in a wide array of applications, including business analytics, program analysis, and ontological reasoning. However, Datalog's restriction to flat facts over atomic constants leads to challenges in working with tree-structured data, such as derivation trees or abstract syntax trees. To ameliorate Datalog's restrictions, popular extensions of Datalog support features such as existential quantification in rule heads (Datalog*, Datalog
∃
) or algebraic data types (Soufflé). Unfortunately, these are imperfect solutions for reasoning over structured and recursive data types, with general existentials leading to complex implementations requiring unification, and ADTs unable to trigger rule evaluation and failing to support efficient indexing.
We present
D
L
∃!
, a Datalog with first-class facts, wherein every fact is identified with a Skolem term unique to the fact. We show that this restriction offers an attractive price point for Datalogbased reasoning over tree-shaped data, demonstrating its application to databases, artificial intelligence, and programming languages. We implemented
D
L
∃!
as a system Slog, which leverages the uniqueness restriction of
D
L
∃!
to enable a communication-avoiding, massively-parallel implementation built on MPI. We show that Slog outperforms leading systems (Nemo, Vlog, RDFox, and Soufflé) on a variety of benchmarks, with the potential to scale to thousands of threads.