๐ค AI Summary
This paper establishes the tractability boundary between hierarchical and non-hierarchical queries for the bag-set maximization problem: given a database and a self-join-free Boolean conjunctive query (SJF-BCQ), maximize the bag-semantics output size under a budget on the number of newly inserted facts. Method: The authors prove that the problem is polynomial-time solvable if and only if the SJF-BCQ is hierarchical; otherwise, it is NP-hard. They further unify its algebraic structure with probabilistic database query evaluation and Shapley value computation over facts, developing a general algebraic framework based on 2-monoids. Contribution/Results: The work identifies hierarchy as the precise dichotomy for tractability of bag-set maximization and provides an O(nแต) unified polynomial-time algorithm for all three problems. This yields the first cross-semantic (probabilistic, explainable AI, optimization) algebraic paradigm for their joint resolution.
๐ Abstract
The class of hierarchical queries is known to define the boundary of the dichotomy between tractability and intractability for the following two extensively studied problems about self-join free Boolean conjunctive queries (SJF-BCQ): (i) evaluating a SJF-BCQ on a tuple-independent probabilistic database; (ii) computing the Shapley value of a fact in a database on which a SJF-BCQ evaluates to true. Here, we establish that hierarchical queries define also the boundary of the dichotomy between tractability and intractability for a different natural algorithmic problem, which we call the"bag-set maximization"problem. The bag-set maximization problem associated with a SJF-BCQ $Q$ asks: given a database $cal D$, find the biggest value that $Q$ takes under bag semantics on a database $cal D'$ obtained from $cal D$ by adding at most $ heta$ facts from another given database $cal D^r$. For non-hierarchical queries, we show that the bag-set maximization problem is an NP-complete optimization problem. More significantly, for hierarchical queries, we show that all three aforementioned problems (probabilistic query evaluation, Shapley value computation, and bag-set maximization) admit a single unifying polynomial-time algorithm that operates on an abstract algebraic structure, called a"2-monoid". Each of the three problems requires a different instantiation of the 2-monoid tailored for the problem at hand.