What Would an LLM Do? Evaluating Policymaking Capabilities of Large Language Models

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study evaluates the potential of large language models (LLMs) in social policy formulation, focusing on global homelessness—affecting over 150 million people. Addressing the critical challenge of poor contextual alignment between generic policy recommendations and region-specific socioeconomic conditions, we introduce the first multi-regional (four-region) decision-making benchmark for social policy, grounded in the capability approach framework. Innovatively, we couple LLM-generated policy proposals with agent-based simulation models to enable end-to-end validation—from recommendation generation to socio-impact assessment. Experiments demonstrate that LLMs produce policy proposals exhibiting high agreement with domain experts; after calibration by local practitioners, proposal feasibility and diversity improve significantly. Our work establishes a novel, empirically grounded paradigm for leveraging LLMs in equitable, interpretable, and scalable social policy design.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) are increasingly being adopted in high-stakes domains. Their capacity to process vast amounts of unstructured data, explore flexible scenarios, and handle a diversity of contextual factors can make them uniquely suited to provide new insights for the complexity of social policymaking. This article evaluates whether LLMs' are aligned with domain experts (and among themselves) to inform social policymaking on the subject of homelessness alleviation - a challenge affecting over 150 million people worldwide. We develop a novel benchmark comprised of decision scenarios with policy choices across four geographies (South Bend, USA; Barcelona, Spain; Johannesburg, South Africa; Macau SAR, China). The policies in scope are grounded in the conceptual framework of the Capability Approach for human development. We also present an automated pipeline that connects the benchmarked policies to an agent-based model, and we explore the social impact of the recommended policies through simulated social scenarios. The paper results reveal promising potential to leverage LLMs for social policy making. If responsible guardrails and contextual calibrations are introduced in collaboration with local domain experts, LLMs can provide humans with valuable insights, in the form of alternative policies at scale.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM alignment with experts on homelessness policy
Assessing LLM decision-making across four geographic contexts
Measuring social impact of LLM-generated policy recommendations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel benchmark with decision scenarios across geographies
Automated pipeline connecting policies to agent-based model
Simulated social scenarios to explore policy impact
🔎 Similar Papers
No similar papers found.
P
Pierre Le Coz
United Nations University Institute in Macau, Macau SAR, China
J
Jia An Liu
United Nations University Institute in Macau, Macau SAR, China
Debarun Bhattacharjya
Debarun Bhattacharjya
Researcher, IBM T.J. Watson Research Center
artificial intelligencedecision analysismachine learningprobabilistic modeling
G
Georgina Curto
United Nations University Institute in Macau, Macau SAR, China
S
Serge Stinckwich
United Nations University Institute in Macau, Macau SAR, China