Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity

📅 2024-12-23

🏛️ Knowledge Discovery and Data Mining

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

To address the high computational cost—cubic time complexity $O(n^3)$ and quadratic space complexity $O(n^2)$—of kernelized generalized score functions in large-scale causal discovery, this paper proposes, for the first time, a linear-complexity $O(n)$ approximate kernelized scoring function. Our method integrates low-rank kernel matrix approximation, composite matrix operation reduction, and a data-adaptive random sampling strategy accommodating heterogeneous data types, collectively reducing computational overhead. Experiments on synthetic and real-world benchmarks demonstrate up to three orders of magnitude speedup, substantial memory footprint reduction, and competitive causal graph recovery accuracy relative to state-of-the-art methods. Notably, the approach scales effectively to datasets with millions of samples, establishing a new, efficient, and robust paradigm for scalable causal structure learning.

Technology Category

Application Category

📝 Abstract

Score-based causal discovery methods can effectively identify causal relationships by evaluating candidate graphs and selecting the one with the highest score. One popular class of scores is kernel-based generalized score functions, which can adapt to a wide range of scenarios and work well in practice because they circumvent assumptions about causal mechanisms and data distributions. Despite these advantages, kernel-based generalized score functions pose serious computational challenges in time and space, with a time complexity of O (n3) and a memory complexity of O (n2), where n is the sample size. In this paper, we propose an approximate kernel-based generalized score function with O (n) time and space complexities by using low-rank technique and designing a set of rules to handle the complex composite matrix operations required to calculate the score, as well as developing sampling algorithms for different data types to benefit the handling of diverse data types efficiently. Our extensive causal discovery experiments on both synthetic and real-world data demonstrate that compared to the state-of-the-art method, our method can not only significantly reduce computational costs, but also achieve comparable accuracy, especially for large datasets.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational complexity of kernel-based causal discovery

Approximating score functions for efficient large-scale data processing

Handling diverse data types with optimized sampling algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Approximate kernel-based generalized score functions

Linear computational complexity via low-rank technique

Efficient sampling algorithms for diverse data types

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Machine Learning Engineer