π€ AI Summary
This paper addresses the low reliability and high estimation bias of econometric inference under few-shot settings. Methodologically, it introduces the first causal modeling framework that systematically aligns large language model (LLM)-generated synthetic choice data with scarce human evidence. It proposes a task-conditioned synthetic choice generation mechanism and a feature-adaptive bias-correction mapping model, integrated with standard econometric estimators to enable counterfactual inference. The core contribution is the first systematic alignment of LLM-synthesized choices with real human behavioral patterns. Empirically, the approach achieves higher demand elasticity and treatment effect estimation accuracy using only 10% of human samples than full-data baselines; regional experiments replicate treatment effects at β65Β±10 bps versus β60Β±8 bps; and on million-scale joint experiments and out-of-time extrapolation tasks, it significantly reduces estimation error, demonstrating strong cross-domain generalizability.
π Abstract
We introduce Agentic Economic Modeling (AEM), a framework that aligns synthetic LLM choices with small-sample human evidence for reliable econometric inference. AEM first generates task-conditioned synthetic choices via LLMs, then learns a bias-correction mapping from task features and raw LLM choices to human-aligned choices, upon which standard econometric estimators perform inference to recover demand elasticities and treatment effects.We validate AEM in two experiments. In a large scale conjoint study with millions of observations, using only 10% of the original data to fit the correction model lowers the error of the demand-parameter estimates, while uncorrected LLM choices even increase the errors. In a regional field experiment, a mixture model calibrated on 10% of geographic regions estimates an out-of-domain treatment effect of -65pm10 bps, closely matching the full human experiment (-60pm8 bps).Under time-wise extrapolation, training with only day-one human data yields -24 bps (95% CI: [-26, -22], p<1e-5),improving over the human-only day-one baseline (-17 bps, 95% CI: [-43, +9], p=0.2049).These results demonstrate AEM's potential to improve RCT efficiency and establish a foundation method for LLM-based counterfactual generation.