Causal Inference for Latent Outcomes Learned with Factor Models

📅 2025-06-25

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This paper addresses “learning-induced interference”—a violation of the stable unit treatment value assumption (SUTVA) wherein nonnegative matrix factorization (NMF)-based estimation of latent outcomes in high-dimensional observational data causes individual potential outcomes to depend on others’ treatment assignments. We provide the first formal definition and theoretical resolution of this problem. Methodologically, we propose a novel algorithm that explicitly corrects for learning-induced interference, ensuring consistent and efficient causal effect estimation. Our theoretical contribution lies in rigorously distinguishing learning-induced interference from genuine interference arising in the data-generating process, and establishing a provably valid framework for causal identification and estimation under NMF-based latent structure learning. Extensive simulations and real-data analysis of cancer mutation profiles demonstrate that our method significantly improves estimation accuracy over existing approaches. The implementation is publicly available as the R package `causalLFO`.

Technology Category

Application Category

📝 Abstract

In many fields$unicode{x2013}$including genomics, epidemiology, natural language processing, social and behavioral sciences, and economics$unicode{x2013}$it is increasingly important to address causal questions in the context of factor models or representation learning. In this work, we investigate causal effects on $ extit{latent outcomes}$ derived from high-dimensional observed data using nonnegative matrix factorization. To the best of our knowledge, this is the first study to formally address causal inference in this setting. A central challenge is that estimating a latent factor model can cause an individual's learned latent outcome to depend on other individuals' treatments, thereby violating the standard causal inference assumption of no interference. We formalize this issue as $ extit{learning-induced interference}$ and distinguish it from interference present in a data-generating process. To address this, we propose a novel, intuitive, and theoretically grounded algorithm to estimate causal effects on latent outcomes while mitigating learning-induced interference and improving estimation efficiency. We establish theoretical guarantees for the consistency of our estimator and demonstrate its practical utility through simulation studies and an application to cancer mutational signature analysis. All baseline and proposed methods are available in our open-source R package, ${ t causalLFO}$.

Problem

Research questions and friction points this paper is trying to address.

Estimating causal effects on latent outcomes from high-dimensional data

Addressing learning-induced interference in factor model estimation

Developing efficient algorithms for causal inference in latent spaces

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses nonnegative matrix factorization for latent outcomes

Introduces learning-induced interference mitigation algorithm

Provides theoretical guarantees for estimator consistency

🔎 Similar Papers

No similar papers found.