What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This study evaluates the potential of large language models (LLMs) in social policy formulation, focusing on global homelessness—affecting over 150 million people. Addressing the critical challenge of poor contextual alignment between generic policy recommendations and region-specific socioeconomic conditions, we introduce the first multi-regional (four-region) decision-making benchmark for social policy, grounded in the capability approach framework. Innovatively, we couple LLM-generated policy proposals with agent-based simulation models to enable end-to-end validation—from recommendation generation to socio-impact assessment. Experiments demonstrate that LLMs produce policy proposals exhibiting high agreement with domain experts; after calibration by local practitioners, proposal feasibility and diversity improve significantly. Our work establishes a novel, empirically grounded paradigm for leveraging LLMs in equitable, interpretable, and scalable social policy design.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are increasingly being adopted in high-stakes domains. Their capacity to process vast amounts of unstructured data, explore flexible scenarios, and handle a diversity of contextual factors can make them uniquely suited to provide new insights for the complexity of social policymaking. This article evaluates whether LLMs' are aligned with domain experts (and among themselves) to inform social policymaking on the subject of homelessness alleviation - a challenge affecting over 150 million people worldwide. We develop a novel benchmark comprised of decision scenarios with policy choices across four geographies (South Bend, USA; Barcelona, Spain; Johannesburg, South Africa; Macau SAR, China). The policies in scope are grounded in the conceptual framework of the Capability Approach for human development. We also present an automated pipeline that connects the benchmarked policies to an agent-based model, and we explore the social impact of the recommended policies through simulated social scenarios. The paper results reveal promising potential to leverage LLMs for social policy making. If responsible guardrails and contextual calibrations are introduced in collaboration with local domain experts, LLMs can provide humans with valuable insights, in the form of alternative policies at scale.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM alignment with experts on homelessness policy

Assessing LLM decision-making across four geographic contexts

Measuring social impact of LLM-generated policy recommendations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel benchmark with decision scenarios across geographies

Automated pipeline connecting policies to agent-based model

Simulated social scenarios to explore policy impact

🔎 Similar Papers

No similar papers found.