Causal Discovery over High-Dimensional Structured Hypothesis Spaces with Causal Graph Partitioning

📅 2024-06-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional causal discovery methods become computationally intractable in high-dimensional settings (up to 10⁴ variables) due to exponential explosion of the search space—specifically, the O(2ⁿ) complexity barrier. Method: This paper proposes a hyperstructure-guided causal graph partitioning framework that decomposes global causal search into parallelizable subproblems, enabling divide-and-conquer optimization with theoretical guarantees for recovering the true causal graph’s Markov equivalence class. Contribution/Results: To our knowledge, this is the first method to achieve provably scalable high-dimensional causal discovery under rigorous theoretical guarantees. On biological synthetic networks, it matches state-of-the-art accuracy while significantly accelerating runtime. It successfully infers a genome-wide gene regulatory network comprising ~10,000 genes, demonstrating both scalability and practical utility in real-world high-dimensional applications.

Technology Category

Application Category

📝 Abstract
The aim in many sciences is to understand the mechanisms that underlie the observed distribution of variables, starting from a set of initial hypotheses. Causal discovery allows us to infer mechanisms as sets of cause and effect relationships in a generalized way -- without necessarily tailoring to a specific domain. Causal discovery algorithms search over a structured hypothesis space, defined by the set of directed acyclic graphs, to find the graph that best explains the data. For high-dimensional problems, however, this search becomes intractable and scalable algorithms for causal discovery are needed to bridge the gap. In this paper, we define a novel causal graph partition that allows for divide-and-conquer causal discovery with theoretical guarantees. We leverage the idea of a superstructure -- a set of learned or existing candidate hypotheses -- to partition the search space. We prove under certain assumptions that learning with a causal graph partition always yields the Markov Equivalence Class of the true causal graph. We show our algorithm achieves comparable accuracy and a faster time to solution for biologically-tuned synthetic networks and networks up to ${10^4}$ variables. This makes our method applicable to gene regulatory network inference and other domains with high-dimensional structured hypothesis spaces.
Problem

Research questions and friction points this paper is trying to address.

Develop scalable causal discovery for high-dimensional data.
Introduce causal graph partitioning for efficient hypothesis search.
Apply method to gene regulatory networks and large datasets.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal graph partitioning for scalable discovery
Divide-and-conquer approach with theoretical guarantees
Superstructure-based search space partitioning
🔎 Similar Papers
No similar papers found.
A
Ashka Shah
Department of Computer Science, University of Chicago; Data Science and Learning Division, Argonne National Laboratory
Adela DePavia
Adela DePavia
CCAM student at UChicago
AlgorithmsOptimizationApplied Mathematics
Nathaniel Hudson
Nathaniel Hudson
Assistant Professor, Illinois Institute of Technology
Edge ComputingEdge IntelligenceInternet-of-ThingsSocial NetworksCyber-Physical Systems
I
Ian Foster
Department of Computer Science, University of Chicago; Data Science and Learning Division, Argonne National Laboratory
Rick Stevens
Rick Stevens
Professor of Computer Science, University of Chicago
HPCBioinformaticsDistributed ComputingVisualizationCollaboration