A Sample Efficient Conditional Independence Test in the Presence of Discretization

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

In practice, continuous variables are often discretized due to measurement constraints, and direct conditional independence (CI) testing on discretized observations leads to erroneous inferences. Existing approaches rely on binarization to recover underlying continuous relationships, suffering substantial information loss. This paper proposes Discrete-CI: the first method that integrates generalized method of moments (GMM) with node regression to directly model CI among latent continuous variables from discrete observations—bypassing binarization entirely. We innovatively introduce overidentifying restrictions to enhance GMM’s statistical efficiency and establish an asymptotically consistent CI test theory. We prove its statistical consistency and demonstrate empirically that Discrete-CI significantly outperforms state-of-the-art CI tests (e.g., KCI, RCIT) in small-sample regimes. Moreover, it is readily applicable within discrete-data causal discovery frameworks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

In many real-world scenarios, interested variables are often represented as discretized values due to measurement limitations. Applying Conditional Independence (CI) tests directly to such discretized data, however, can lead to incorrect conclusions. To address this, recent advancements have sought to infer the correct CI relationship between the latent variables through binarizing observed data. However, this process inevitably results in a loss of information, which degrades the test's performance. Motivated by this, this paper introduces a sample-efficient CI test that does not rely on the binarization process. We find that the independence relationships of latent continuous variables can be established by addressing an over-identifying restriction problem with Generalized Method of Moments (GMM). Based on this insight, we derive an appropriate test statistic and establish its asymptotic distribution correctly reflecting CI by leveraging nodewise regression. Theoretical findings and Empirical results across various datasets demonstrate that the superiority and effectiveness of our proposed test. Our code implementation is provided in https://github.com/boyangaaaaa/DCT

Problem

Research questions and friction points this paper is trying to address.

Tests conditional independence without discretization data loss

Uses GMM to infer latent variable relationships

Derives test statistic via nodewise regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses GMM for latent variable independence

Avoids data binarization information loss

Leverages nodewise regression for CI

🔎 Similar Papers

A Conditional Independence Test in the Presence of Discretization