🤖 AI Summary
This work investigates the capability of large language models (LLMs) in participatory budgeting (PB)—a canonical structured resource allocation problem—focusing on implicit preference inference from natural language inputs and generating near-optimal project allocations under budget constraints. We propose modeling PB as a dynamic evaluation framework, enabling the first end-to-end joint modeling of preference inference and resource allocation without reliance on explicit voting mechanisms. Three prompt strategies are designed: greedy selection, direct optimization, and hill-climbing–inspired prompting. Performance is rigorously evaluated against a utility-maximizing oracle. Results demonstrate that prompt engineering substantially impacts outcome quality; LLMs effectively parse unstructured text, reconstruct structured preferences, and produce allocations achieving 85%–92% of the oracle’s utility. This validates LLMs’ potential for mechanism-design–driven reasoning tasks requiring structured decision-making from linguistic inputs.
📝 Abstract
Large Language Models (LLMs) are increasingly expected to handle complex decision-making tasks, yet their ability to perform structured resource allocation remains underexplored. Evaluating their reasoning is also difficult due to data contamination and the static nature of existing benchmarks. We present a dual-purpose framework leveraging Participatory Budgeting (PB) both as (i) a practical setting for LLM-based resource allocation and (ii) an adaptive benchmark for evaluating their reasoning capabilities. We task LLMs with selecting project subsets under feasibility (e.g., budget) constraints via three prompting strategies: greedy selection, direct optimization, and a hill-climbing-inspired refinement. We benchmark LLMs' allocations against a utility-maximizing oracle. Interestingly, we also test whether LLMs can infer structured preferences from natural-language voter input or metadata, without explicit votes. By comparing allocations based on inferred preferences to those from ground-truth votes, we evaluate LLMs' ability to extract preferences from open-ended input. Our results underscore the role of prompt design and show that LLMs hold promise for mechanism design with unstructured inputs.