🤖 AI Summary
This work addresses the challenge of missing values in relational databases arising from specific missingness mechanisms. Departing from conventional NULL-based treatments, it introduces a novel approach that explicitly models the missingness mechanism as a missingness graph and integrates it with Bayesian networks to construct block-independent probabilistic databases. Building upon this foundation, the paper proposes two query answering methods that combine probabilistic inference with statistical estimation, effectively capturing result uncertainty while ensuring statistical soundness. Theoretical analysis establishes the computational complexity of the proposed methods, demonstrating their practical feasibility. This framework establishes a new paradigm for semantically meaningful querying over databases with missing data.
📝 Abstract
We address the problems of giving a semantics to- and doing query answering (QA) on a relational database (RDB) that has missing values (MVs). The causes for the latter are governed by a Missingness Mechanism that is modelled as a Bayesian Network, which represents a Missingness Graph (MG) and involves the DB attributes. Our approach considerable departs from the treatment of RDBs with NULL (values). The MG together with the observed DB allow to build a block-independent probabilistic DB, on which basis we propose two QA techniques that jointly capture probabilistic uncertainty and statistical plausibility of the implicit imputation of MVs. We obtain complexity results that characterize the computational feasibility of those approaches.