🤖 AI Summary
This study addresses the challenge of estimating dynamic causal effects from unstructured data—such as text or images—where existing methods fall short due to their treatment of such data as static observations. The authors propose the first statistical framework that integrates generative artificial intelligence (GenAI) with marginal structural models. By leveraging internal representations extracted from GenAI and jointly learning deconfounders for time-varying treatment features across sequences, the method enables asymptotically efficient estimation of dynamic causal effects. It further supports causal inference on temporal attributes of treatments, such as their position in a sequence, and provides valid confidence intervals. Simulations demonstrate the estimator’s accuracy and nominal coverage, while an analysis of real-world protest-related texts reveals pronounced sensitivity of treatment effects to their sequential position.
📝 Abstract
A growing number of scholars seek to estimate causal effects of unstructured data such as text, images, and video. However, existing methods typically treat each object as a single, static observation. We develop a statistical framework for dynamic causal inference with unstructured data by leveraging generative artificial intelligence (GenAI) models. Our approach enables researchers to estimate the causal effects of sequences of treatment features, including their positions within text and video. We first extract internal representations of unstructured objects from a GenAI model and then estimate a marginal structural model using a neural network architecture that jointly learns a deconfounder for each treatment feature in the sequence. Our semiparametric inference framework yields valid asymptotic confidence intervals. Simulation studies demonstrate that the proposed estimator recovers the target causal effects and that the confidence intervals achieve nominal coverage in finite samples. We further apply our method to a randomized experiment on the Hong Kong protests, showing that the effect of a treatment feature depends critically on its position within the text.