Causal Discovery on Dependent Mixed Data with Applications to Gene Regulatory Network Inference

📅 2026-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of causal discovery in the presence of sample dependencies and mixed continuous-discrete variables by proposing a decorrelation framework based on a latent-variable structural equation model. The method employs an EM algorithm to impute latent variables corresponding to discrete observations and integrates pairwise maximum likelihood covariance estimation with a decorrelation transformation to map the original data into independent and identically distributed latent representations compatible with standard DAG learning algorithms. To the best of our knowledge, this is the first approach that jointly handles mixed variable types and inter-sample dependencies, enabling unified modeling of correlated Gaussian errors and discrete measurements. Experiments demonstrate that the proposed method significantly outperforms existing baselines in simulations and yields gene regulatory networks from single-cell RNA-seq data with higher predictive likelihood, where high-confidence edges show strong concordance with known biological pathways.

Technology Category

Application Category

📝 Abstract
Causal discovery aims to infer causal relationships among variables from observational data, typically represented by a directed acyclic graph (DAG). Most existing methods assume independent and identically distributed observations, an assumption often violated in modern applications. In addition, many datasets contain a mixture of continuous and discrete variables, which further complicates causal modeling when dependence across samples is present. To address these challenges, we propose a de-correlation framework for causal discovery from dependent mixed data. Our approach formulates a structural equation model with latent variables that accommodates both continuous and discrete variables while allowing correlated Gaussian errors across units. We estimate the dependence structure among samples via a pairwise maximum likelihood estimator for the covariance matrix and develop an EM algorithm to impute latent variables underlying discrete observations. A de-correlation transformation of the recovered latent data enables the use of standard causal discovery algorithms to learn the underlying causal graph. Simulation studies demonstrate that the proposed method substantially improves causal graph recovery compared with applying standard methods directly to the original dependent data. We apply our approach to single-cell RNA sequencing data to infer gene regulatory networks governing embryonic stem cell differentiation. The inferred regulatory networks show significantly improved predictive likelihood on test data, and many high-confidence edges are supported by known regulatory interactions reported in the literature.
Problem

Research questions and friction points this paper is trying to address.

causal discovery
dependent data
mixed data
gene regulatory network
observational data
Innovation

Methods, ideas, or system contributions that make the work stand out.

causal discovery
dependent mixed data
de-correlation framework
latent variable model
gene regulatory network
🔎 Similar Papers
No similar papers found.