Adaptive political surveys and GPT-4: Tackling the cold start problem with simulated user interactions

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Adaptive political questionnaires suffer from a cold-start problem due to the scarcity of real user interaction data for training item-selection models. This paper proposes the first solution leveraging GPT-4 to generate high-fidelity synthetic response data. We systematically validate—via KL divergence analysis—that GPT-4 accurately emulates multi-partisan voter response distributions along ideological dimensions, achieving strong alignment with ground-truth Smartvote VAA data. The synthetic data is used to pretrain a Bayesian adaptive questionnaire model, substantially reducing response prediction error and elevating candidate recommendation accuracy close to the oracle upper bound. Our core contribution is the empirical demonstration that large language models can serve as reliable “data engines” for political attitude modeling, establishing a scalable, empirically verifiable paradigm for adaptive survey design in data-scarce settings.

Technology Category

Application Category

📝 Abstract

Adaptive questionnaires dynamically select the next question for a survey participant based on their previous answers. Due to digitalisation, they have become a viable alternative to traditional surveys in application areas such as political science. One limitation, however, is their dependency on data to train the model for question selection. Often, such training data (i.e., user interactions) are unavailable a priori. To address this problem, we (i) test whether Large Language Models (LLM) can accurately generate such interaction data and (ii) explore if these synthetic data can be used to pre-train the statistical model of an adaptive political survey. To evaluate this approach, we utilise existing data from the Swiss Voting Advice Application (VAA) Smartvote in two ways: First, we compare the distribution of LLM-generated synthetic data to the real distribution to assess its similarity. Second, we compare the performance of an adaptive questionnaire that is randomly initialised with one pre-trained on synthetic data to assess their suitability for training. We benchmark these results against an"oracle"questionnaire with perfect prior knowledge. We find that an off-the-shelf LLM (GPT-4) accurately generates answers to the Smartvote questionnaire from the perspective of different Swiss parties. Furthermore, we demonstrate that initialising the statistical model with synthetic data can (i) significantly reduce the error in predicting user responses and (ii) increase the candidate recommendation accuracy of the VAA. Our work emphasises the considerable potential of LLMs to create training data to improve the data collection process in adaptive questionnaires in LLM-affine areas such as political surveys.

Problem

Research questions and friction points this paper is trying to address.

Addressing cold start problem in adaptive political surveys

Using GPT-4 to generate synthetic user interaction data

Improving adaptive questionnaire performance with synthetic data

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate synthetic user interaction data.

Synthetic data pre-trains adaptive survey models.

GPT-4 reduces prediction error in surveys.

🔎 Similar Papers

No similar papers found.