🤖 AI Summary
Existing database attribution methods—such as Shapley values—fail to accurately quantify the causal contribution strength of individual tuples to query results.
Method: We propose the Causal-Effect Score (CES), the first framework to integrate structural causal models (SCMs) and counterfactual reasoning into data management, enabling unified tuple-level causal attribution for both deterministic and probabilistic databases. CES combines query semantics modeling, probabilistic inference, and efficient approximation algorithms.
Contribution/Results: We provide an axiomatized definition of CES, analyze its computational complexity, and prove it satisfies key causal properties—including causal sensitivity and consistency. Experiments demonstrate that CES significantly outperforms baseline methods in attribution accuracy while maintaining strong scalability. By bridging causal inference and database systems, CES establishes a novel paradigm for interpretable, causally grounded database explanations.
📝 Abstract
The Causal Effect (CE) is a numerical measure of causal influence of variables on observed results. Despite being widely used in many areas, only preliminary attempts have been made to use CE as an attribution score in data management, to measure the causal strength of tuples for query answering in databases. In this work, we introduce, generalize and investigate the so-called Causal-Effect Score in the context of classical and probabilistic databases.