🤖 AI Summary
This work addresses the challenge of efficiently eliciting group-level information under limited query and participant budgets, particularly in settings with missing data and incomplete responses. We propose the first closed-loop adaptive survey framework that jointly optimizes question generation and respondent selection. Our approach leverages large language models to estimate the expected information gain of candidate questions and employs a heterogeneous graph neural network to aggregate known responses and individual attributes, enabling imputation of missing information and guiding subsequent sampling. By integrating structure-aware inference of missing responses with an active querying strategy, the method significantly improves the accuracy of group attribute prediction across three real-world opinion datasets, achieving a relative gain of over 12% on the CES dataset under a 10% respondent budget.
📝 Abstract
Eliciting information to reduce uncertainty about latent group-level properties from surveys and other collective assessments requires allocating limited questioning effort under real costs and missing data. Although large language models enable adaptive, multi-turn interactions in natural language, most existing elicitation methods optimize what to ask with a fixed respondent pool, and do not adapt respondent selection or leverage population structure when responses are partial or incomplete. To address this gap, we study adaptive group elicitation, a multi-round setting where an agent adaptively selects both questions and respondents under explicit query and participation budgets. We propose a theoretically grounded framework that combines (i) an LLM-based expected information gain objective for scoring candidate questions with (ii) heterogeneous graph neural network propagation that aggregates observed responses and participant attributes to impute missing responses and guide per-round respondent selection. This closed-loop procedure queries a small, informative subset of individuals while inferring population-level responses via structured similarity. Across three real-world opinion datasets, our method consistently improves population-level response prediction under constrained budgets, including a>12% relative gain on CES at a 10% respondent budget.