🤖 AI Summary
This paper addresses the limitation of rule-based models (e.g., decision trees) in capturing spurious correlations and failing to identify causal relationships. To overcome this, we propose a novel method integrating invariant causal prediction with the set-covering machine framework. Our approach is the first to embed environment-invariance constraints into binary conjunctive/disjunctive rule learning and introduce a causal sufficiency test, enabling polynomial-time theoretical guarantees for identifying the causal parents of the target variable. Unlike conventional interpretable models, our method achieves both strong interpretability and causal robustness. We provide formal theoretical proofs of correct causal identification and empirically demonstrate significant improvements over state-of-the-art baselines across multiple synthetic and real-world datasets. The method consistently extracts valid causal rules under challenging conditions—including label noise, covariate shift, and distributional shifts—while preserving computational efficiency and transparency.
📝 Abstract
Rule-based models, such as decision trees, appeal to practitioners due to their interpretable nature. However, the learning algorithms that produce such models are often vulnerable to spurious associations and thus, they are not guaranteed to extract causally-relevant insights. In this work, we build on ideas from the invariant causal prediction literature to propose Invariant Causal Set Covering Machines, an extension of the classical Set Covering Machine algorithm for conjunctions/disjunctions of binary-valued rules that provably avoids spurious associations. We demonstrate both theoretically and empirically that our method can identify the causal parents of a variable of interest in polynomial time.