π€ AI Summary
Causal direction identification under heteroscedastic symmetric noise models (HSNMs) remains challenging due to the absence of explicit noise assumptions and latent confounding.
Method: We propose a skewness-based identifiability criterion: under the true causal direction (X
ightarrow Y), the skewness of the score function (i.e., the gradient of the log-density) vanishes; it is nonzero under the anticausal direction. Leveraging this, we design SkewScoreβa computationally efficient, noise-agnostic algorithm that directly estimates score-function skewness without explicit noise modeling.
Contribution/Results: This is the first work to introduce skewness statistics into HSNM-based causal inference. SkewScore is theoretically identifiable for both multivariate systems and settings with latent confounders, with rigorous identifiability proofs provided. Experiments demonstrate its robustness under heteroscedastic noise and latent confounding, consistently outperforming state-of-the-art baselines.
π Abstract
Real-world data often violates the equal-variance assumption (homoscedasticity), making it essential to account for heteroscedastic noise in causal discovery. In this work, we explore heteroscedastic symmetric noise models (HSNMs), where the effect $Y$ is modeled as $Y = f(X) + sigma(X)N$, with $X$ as the cause and $N$ as independent noise following a symmetric distribution. We introduce a novel criterion for identifying HSNMs based on the skewness of the score (i.e., the gradient of the log density) of the data distribution. This criterion establishes a computationally tractable measurement that is zero in the causal direction but nonzero in the anticausal direction, enabling the causal direction discovery. We extend this skewness-based criterion to the multivariate setting and propose SkewScore, an algorithm that handles heteroscedastic noise without requiring the extraction of exogenous noise. We also conduct a case study on the robustness of SkewScore in a bivariate model with a latent confounder, providing theoretical insights into its performance. Empirical studies further validate the effectiveness of the proposed method.