One Size Does NOT Fit All: On the Importance of Physical Representations for Datalog Evaluation

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Existing Datalog engines typically employ a uniform physical relation representation, which struggles to simultaneously optimize performance across diverse mixed operations—such as insertion, lookup, and containment checks—under varying workloads. This work presents the first systematic analysis of the relationship between seven-dimensional workload characteristics and physical representations, introducing a decision tree–based adaptive selection mechanism that dynamically matches recursive Datalog programs with their optimal physical representation. Experimental results demonstrate that this approach significantly improves evaluation efficiency and clearly identifies the key workload dimensions governing representation choice, thereby establishing a new paradigm for performance optimization in Datalog engines.

Technology Category

Application Category

📝 Abstract

Datalog is an increasingly popular recursive query language that is declarative by design, meaning its programs must be translated by an engine into the actual physical execution plan. When generating this plan, a central decision is how to physically represent all involved relations, an aspect in which existing Datalog engines are surprisingly restrictive and often resort to one-size-fits-all solutions. The reason for this is that the typical execution plan of a Datalog program not only performs a single type of operation against the physical representations, but a mixture of operations, such as insertions, lookups, and containment-checks. Further, the relevance of each operation type highly depends on the workload characteristics, which range from familiar properties such as the size, multiplicity, and arity of the individual relations to very specific Datalog properties, such as the "interweaving" of rules when relations occur multiple times, and in particular the recursiveness of the query which might generate new tuples on the fly during evaluation. This indicates that a variety of physical representations, each with its own strengths and weaknesses, is required to meet the specific needs of different workload situations. To evaluate this, we conduct an in-depth experimental study of the interplay between potentially suitable physical representations and seven dimensions of workload characteristics that vary across actual Datalog programs, revealing which properties actually matter. Based on these insights, we design an automatic selection mechanism that utilizes a set of decision trees to identify suitable physical representations for a given workload.

Problem

Research questions and friction points this paper is trying to address.

Datalog

physical representations

workload characteristics

recursive queries

relation representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Datalog evaluation

physical representation

workload characteristics