Permutation-Based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data

📅 2025-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional rank-based tests for causal discovery with mixed (continuous + discrete) data fail under discretization, leading to inflated Type I error rates. Method: We propose a permutation-based rank test that establishes, for the first time, the permutation exchangeability of cross-covariance matrix rank tests under discretization, enabling asymptotically exact significance control in the presence of confounding discrete variables. Our approach integrates permutation testing, rank-statistic construction, and asymptotic distribution estimation—without requiring strong continuity assumptions on variables. Results: Experiments on synthetic and real-world data—including psychometric ordinal variables—demonstrate strict Type I error control and significantly higher statistical power than state-of-the-art methods. The method successfully enables causal structure learning in mixed-variable settings, advancing practical applicability of constraint-based causal discovery to discretized and ordinal data.

Technology Category

Application Category

📝 Abstract
Recent advances have shown that statistical tests for the rank of cross-covariance matrices play an important role in causal discovery. These rank tests include partial correlation tests as special cases and provide further graphical information about latent variables. Existing rank tests typically assume that all the continuous variables can be perfectly measured, and yet, in practice many variables can only be measured after discretization. For example, in psychometric studies, the continuous level of certain personality dimensions of a person can only be measured after being discretized into order-preserving options such as disagree, neutral, and agree. Motivated by this, we propose Mixed data Permutation-based Rank Test (MPRT), which properly controls the statistical errors even when some or all variables are discretized. Theoretically, we establish the exchangeability and estimate the asymptotic null distribution by permutations; as a consequence, MPRT can effectively control the Type I error in the presence of discretization while previous methods cannot. Empirically, our method is validated by extensive experiments on synthetic data and real-world data to demonstrate its effectiveness as well as applicability in causal discovery.
Problem

Research questions and friction points this paper is trying to address.

Causal Inference
Mixed Data
Statistical Testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixed-data Causal Inference
MPRT Statistical Test
Error Control in Chunked Data