RADIUS: Ranking, Distribution, and Significance - A Comprehensive Alignment Suite for Survey Simulation

📅 2026-03-19

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing evaluation metrics for survey simulation are fragmented and lack standardization, often overlooking the critical dimension of response option alignment, which hinders meaningful comparison of model performance. To address this gap, this work proposes RADIUS—the first two-dimensional evaluation framework that jointly incorporates rank alignment and distribution alignment. RADIUS systematically assesses the quality of large language models in survey simulation by integrating rank consistency measures, distributional similarity metrics (e.g., KL divergence), and statistical significance testing. The framework not only exposes the limitations of conventional metrics but also establishes a more reliable, comparable, and decision-relevant benchmark. To foster standardized evaluation practices in the research community, the authors open-source the RADIUS implementation.

Technology Category

Application Category

📝 Abstract

Simulation of surveys using LLMs is emerging as a powerful application for generating human-like responses at scale. Prior work evaluates survey simulation using metrics borrowed from other domains, which are often ad hoc, fragmented, and non-standardized, leading to results that are difficult to compare. Moreover, existing metrics focus mainly on accuracy or distributional measures, overlooking the critical dimension of ranking alignment. In practice, a simulation can achieve high accuracy while still failing to capture the option most preferred by humans - a distinction that is critical in decision-making applications. We introduce RADIUS, a comprehensive two-dimensional alignment suite for survey simulation that captures: 1) RAnking alignment and 2) DIstribUtion alignment, each complemented by statistical Significance testing. RADIUS highlights the limitations of existing metrics, enables more meaningful evaluation of survey simulation, and provides an open-source implementation for reproducible and comparable assessment.

Problem

Research questions and friction points this paper is trying to address.

survey simulation

ranking alignment

distribution alignment

evaluation metrics

LLM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ranking Alignment

Distribution Alignment

Statistical Significance