🤖 AI Summary
This study addresses the representation bottleneck and uncertainty quantification challenges posed by unstructured data (text, images) in causal inference and predictive modeling. To this end, we propose the Generative AI–driven Inference (GPI) framework, which extracts low-dimensional semantic representations from off-the-shelf open-source generative models (e.g., LLMs, diffusion models) without fine-tuning, and seamlessly integrates them into statistical inference pipelines for structured modeling, treatment effect estimation, and rigorous uncertainty quantification. Its core innovation lies in decoupling representation learning from downstream inference—thereby enhancing computational efficiency, cross-domain generalizability, and interpretability—while circumventing the need for task-specific fine-tuning or supervised annotations. We validate GPI across three real-world empirical settings: social media content moderation, facial appearance effects on electoral outcomes, and persuasive efficacy of political rhetoric. All implementation tools are publicly released.
📝 Abstract
We introduce GenAI-Powered Inference (GPI), a statistical framework for both causal and predictive inference using unstructured data, including text and images. GPI leverages open-source Generative Artificial Intelligence (GenAI) models - such as large language models and diffusion models - not only to generate unstructured data at scale but also to extract low-dimensional representations that capture their underlying structure. Applying machine learning to these representations, GPI enables estimation of causal and predictive effects while quantifying associated estimation uncertainty. Unlike existing approaches to representation learning, GPI does not require fine-tuning of generative models, making it computationally efficient and broadly accessible. We illustrate the versatility of the GPI framework through three applications: (1) analyzing Chinese social media censorship, (2) estimating predictive effects of candidates' facial appearance on electoral outcomes, and (3) assessing the persuasiveness of political rhetoric. An open-source software package is available for implementing GPI.