Learning from Convenience Samples: A Case Study on Fine-Tuning LLMs for Survey Non-response in the German Longitudinal Election Study

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Rising costs of probability sampling and random/systematic nonresponse in election surveys lead to missing self-reported vote choices. Method: This paper proposes fine-tuning lightweight open-source large language models (LLMs) with 3B–8B parameters on partially observed response data to impute missing vote choices. Unlike conventional inference relying on random samples, our approach performs supervised fine-tuning on biased convenience samples (e.g., student populations) and conducts comparative evaluation via zero-shot prompting. Results: Evaluated on the German Longitudinal Election Study, the fine-tuned LLMs match or exceed CatBoost—a state-of-the-art tabular classifier—in performance under missing-completely-at-random (MCAR) conditions and substantially outperform zero-shot baselines. Under more challenging missing-not-at-random (MNAR) conditions, fine-tuned models recover both individual-level predictions and population-level vote distributions more accurately, enhancing inferential reliability in nonprobability sampling. This work establishes a novel paradigm for addressing systematic missingness and enabling scientifically rigorous use of convenience samples.

Technology Category

Application Category

📝 Abstract
Survey researchers face two key challenges: the rising costs of probability samples and missing data (e.g., non-response or attrition), which can undermine inference and increase the use of convenience samples. Recent work explores using large language models (LLMs) to simulate respondents via persona-based prompts, often without labeled data. We study a more practical setting where partial survey responses exist: we fine-tune LLMs on available data to impute self-reported vote choice under both random and systematic nonresponse, using the German Longitudinal Election Study. We compare zero-shot prompting and supervised fine-tuning against tabular classifiers (e.g., CatBoost) and test how different convenience samples (e.g., students) used for fine-tuning affect generalization. Our results show that when data are missing completely at random, fine-tuned LLMs match tabular classifiers but outperform zero-shot approaches. When only biased convenience samples are available, fine-tuning small (3B to 8B) open-source LLMs can recover both individual-level predictions and population-level distributions more accurately than zero-shot and often better than tabular methods. This suggests fine-tuned LLMs offer a promising strategy for researchers working with non-probability samples or systematic missingness, and may enable new survey designs requiring only easily accessible subpopulations.
Problem

Research questions and friction points this paper is trying to address.

Fine-tuning LLMs to impute missing survey responses under nonresponse conditions
Comparing LLM fine-tuning with tabular classifiers for biased convenience samples
Developing methods to handle systematic missingness in survey data using accessible subpopulations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs on convenience samples for imputation
Comparing fine-tuned LLMs with zero-shot and tabular methods
Using small open-source LLMs to recover survey distributions
🔎 Similar Papers
No similar papers found.