๐ค AI Summary
Existing causal discovery methods often rely on strong assumptions, interventional data, or lack integration of domain knowledge, limiting their practical deployment. This work proposes a pretraining framework that incorporates weak prior domain knowledge in a principled manner, marking the first approach to systematically embed coarse-grained priors into causal discovery. The method introduces a dual-source encoderโdecoder architecture combined with a curriculum learning strategy to jointly model observational data and prior knowledge, adaptively handling varying levels of prior strength, graph density, and variable scale. Evaluated on in-distribution, out-of-distribution, and real-world datasets, the proposed approach significantly outperforms current state-of-the-art methods, demonstrating remarkable robustness and practical applicability.
๐ Abstract
Causal discovery has been widely studied, yet many existing methods rely on strong assumptions or fall into two extremes: either depending on costly interventional signals or partial ground truth as strong priors, or adopting purely data driven paradigms with limited guidance, which hinders practical deployment. Motivated by real-world scenarios where only coarse domain knowledge is available, we propose a knowledge-informed pretrained model for causal discovery that integrates weak prior knowledge as a principled middle ground. Our model adopts a dual source encoder-decoder architecture to process observational data in a knowledge-informed way. We design a diverse pretraining dataset and a curriculum learning strategy that smoothly adapts the model to varying prior strengths across mechanisms, graph densities, and variable scales. Extensive experiments on in-distribution, out-of distribution, and real-world datasets demonstrate consistent improvements over existing baselines, with strong robustness and practical applicability.