Causal Inference on Outcomes Learned from Text

📅 2025-03-02

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This paper addresses causal inference from text in randomized experiments, focusing on three core questions: (1) Does treatment affect text generation? (2) Through which latent semantic outcomes does this effect manifest? (3) Is the causal description complete? To this end, we propose the first verifiable text causal inference framework: leveraging large language models (LLMs) to automatically generate testable hypotheses about textual differences, then combining sample splitting, doubly robust estimation, and human-annotated validation to achieve statistically rigorous and reproducible causal identification. Our method integrates text representation learning with causal inference theory, overcoming limitations of prior approaches reliant on predefined keywords or manually specified outcomes. Empirical evaluation on academic abstract data demonstrates that the framework accurately detects inter-group semantic disparities, produces causal effect statements with controlled p-values, and substantially enhances both the automation level and scientific rigor of text-based causal discovery.

Technology Category

Application Category

📝 Abstract

We propose a machine-learning tool that yields causal inference on text in randomized trials. Based on a simple econometric framework in which text may capture outcomes of interest, our procedure addresses three questions: First, is the text affected by the treatment? Second, which outcomes is the effect on? And third, how complete is our description of causal effects? To answer all three questions, our approach uses large language models (LLMs) that suggest systematic differences across two groups of text documents and then provides valid inference based on costly validation. Specifically, we highlight the need for sample splitting to allow for statistical validation of LLM outputs, as well as the need for human labeling to validate substantive claims about how documents differ across groups. We illustrate the tool in a proof-of-concept application using abstracts of academic manuscripts.

Problem

Research questions and friction points this paper is trying to address.

Assesses if text is influenced by treatment in trials.

Identifies which outcomes are affected by treatment.

Evaluates completeness of causal effects description.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses large language models for causal inference

Implements sample splitting for statistical validation

Incorporates human labeling to validate document differences

🔎 Similar Papers

Causal Inference with Large Language Model: A Survey