🤖 AI Summary
Rising costs of probability sampling and random/systematic nonresponse in election surveys lead to missing self-reported vote choices. Method: This paper proposes fine-tuning lightweight open-source large language models (LLMs) with 3B–8B parameters on partially observed response data to impute missing vote choices. Unlike conventional inference relying on random samples, our approach performs supervised fine-tuning on biased convenience samples (e.g., student populations) and conducts comparative evaluation via zero-shot prompting. Results: Evaluated on the German Longitudinal Election Study, the fine-tuned LLMs match or exceed CatBoost—a state-of-the-art tabular classifier—in performance under missing-completely-at-random (MCAR) conditions and substantially outperform zero-shot baselines. Under more challenging missing-not-at-random (MNAR) conditions, fine-tuned models recover both individual-level predictions and population-level vote distributions more accurately, enhancing inferential reliability in nonprobability sampling. This work establishes a novel paradigm for addressing systematic missingness and enabling scientifically rigorous use of convenience samples.
📝 Abstract
Survey researchers face two key challenges: the rising costs of probability samples and missing data (e.g., non-response or attrition), which can undermine inference and increase the use of convenience samples. Recent work explores using large language models (LLMs) to simulate respondents via persona-based prompts, often without labeled data. We study a more practical setting where partial survey responses exist: we fine-tune LLMs on available data to impute self-reported vote choice under both random and systematic nonresponse, using the German Longitudinal Election Study. We compare zero-shot prompting and supervised fine-tuning against tabular classifiers (e.g., CatBoost) and test how different convenience samples (e.g., students) used for fine-tuning affect generalization.
Our results show that when data are missing completely at random, fine-tuned LLMs match tabular classifiers but outperform zero-shot approaches. When only biased convenience samples are available, fine-tuning small (3B to 8B) open-source LLMs can recover both individual-level predictions and population-level distributions more accurately than zero-shot and often better than tabular methods. This suggests fine-tuned LLMs offer a promising strategy for researchers working with non-probability samples or systematic missingness, and may enable new survey designs requiring only easily accessible subpopulations.