🤖 AI Summary
This paper investigates the fitting problem for conjunctive queries (CQs) and their variants—tree CQs and unions of CQs (UCQs): given labeled positive and negative data instances, construct a query that correctly classifies them. To address solution non-uniqueness, we formally define and systematically characterize three extremal solution classes: the most general, the most specific, and the unique fitting CQ. We establish novel, deep algebraic connections between these extremal solutions and homomorphism duality, frontier structures, and direct products—unifying treatment across CQs, tree CQs, and UCQs. We fully characterize the existence conditions and structural properties of each extremal class, precisely determine the computational complexity of existence and verification (ranging over P, NP, and Π₂^p-completeness), and provide tight size bounds for fitting CQs.
📝 Abstract
The fitting problem for conjunctive queries (CQs) is the problem to construct a CQ that fits a given set of labeled data examples. When a fitting CQ exists, it is in general not unique. This leads us to proposing natural refinements of the notion of a fitting CQ, such as most-general fitting CQ, most-specific fitting CQ, and unique fitting CQ. We give structural characterizations of these notions in terms of (suitable refinements of) homomorphism dualities, frontiers, and direct products, which enable the construction of the refined fitting CQs when they exist. We also pinpoint the complexity of the associated existence and verification problems, and determine the size of fitting CQs. We study the same problems for UCQs and for the more restricted class of tree CQs.