🤖 AI Summary
This paper investigates the formula and circuit complexity of provenance polynomials for Datalog programs over absorptive semirings, focusing on asymptotically tight bounds for optimal depth and size. Addressing the central question—“Do polynomial-size formulas exist?”—it establishes the first dichotomy on formula size for Datalog. It precisely classifies circuit depth as either Θ(log m) or Θ(log²m), and introduces the *fringe property* as a sufficient condition for O(log²m)-depth, low-depth circuit constructions. Theoretically, it provides a complete characterization of boundedness for Datalog over generalized semirings. Practically, it yields efficient provenance circuits of polynomial size and depth O(log²m). The work unifies absorptive semiring theory, Datalog semantics, algebraic proof techniques, and circuit complexity analysis, advancing both foundational understanding and implementable solutions for provenance computation.
📝 Abstract
In this paper, we study circuits and formulas for provenance polynomials of Datalog programs. We ask the following question: given an absorptive semiring and a fact of a Datalog program, what is the optimal depth and size of a circuit/formula that computes its provenance polynomial? We focus on absorptive semirings as these guarantee the existence of a polynomial-size circuit. Our main result is a dichotomy for several classes of Datalog programs on whether they admit a formula of polynomial size or not. We achieve this result by showing that for these Datalog programs the optimal circuit depth is either $Theta(log m)$ or $Theta(log^2 m)$, where $m$ is the input size. We also show that for Datalog programs with the polynomial fringe property, we can always construct low-depth circuits of size $O(log^2 m)$. Finally, we give characterizations of when Datalog programs are bounded over more general semirings.