A Sample Efficient Conditional Independence Test in the Presence of Discretization

📅 2025-06-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In practice, continuous variables are often discretized due to measurement constraints, and direct conditional independence (CI) testing on discretized observations leads to erroneous inferences. Existing approaches rely on binarization to recover underlying continuous relationships, suffering substantial information loss. This paper proposes Discrete-CI: the first method that integrates generalized method of moments (GMM) with node regression to directly model CI among latent continuous variables from discrete observations—bypassing binarization entirely. We innovatively introduce overidentifying restrictions to enhance GMM’s statistical efficiency and establish an asymptotically consistent CI test theory. We prove its statistical consistency and demonstrate empirically that Discrete-CI significantly outperforms state-of-the-art CI tests (e.g., KCI, RCIT) in small-sample regimes. Moreover, it is readily applicable within discrete-data causal discovery frameworks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
In many real-world scenarios, interested variables are often represented as discretized values due to measurement limitations. Applying Conditional Independence (CI) tests directly to such discretized data, however, can lead to incorrect conclusions. To address this, recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data. However, this process inevitably results in a loss of information, which degrades the test's performance. Motivated by this, this paper introduces a sample-efficient CI test that does not rely on the binarization process. We find that the independence relationships of latent continuous variables can be established by addressing an over-identifying restriction problem with Generalized Method of Moments (GMM). Based on this insight, we derive an appropriate test statistic and establish its asymptotic distribution correctly reflecting CI by leveraging nodewise regression. Theoretical findings and Empirical results across various datasets demonstrate that the superiority and effectiveness of our proposed test. Our code implementation is provided in https://github.com/boyangaaaaa/DCT
Problem

Research questions and friction points this paper is trying to address.

Tests conditional independence without discretization data loss
Uses GMM to infer latent variable relationships
Derives test statistic via nodewise regression
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GMM for latent variable independence
Avoids data binarization information loss
Leverages nodewise regression for CI
🔎 Similar Papers
No similar papers found.