Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding

📅 2025-01-28

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the low accuracy and lack of interpretable world-modeling capability of large language models (LLMs) and vision-language models (VLMs) on guesstimation—approximate quantitative estimation—tasks. To mitigate these limitations, we propose a Wisdom of Crowds (WOC)-based decoding strategy: generating diverse estimates via multi-round sampling and aggregating them via median selection to enhance robustness. We introduce MARBLES, the first multimodal (image-text) guesstimation benchmark, and systematically integrate WOC into LLM/VLM decoding for the first time. Our findings are threefold: (1) WOC substantially improves estimation accuracy across models; (2) guesstimation serves as an effective probe for assessing implicit world-modeling competence; and (3) incorporating visual input further boosts performance, demonstrating the critical role of multimodal synergy in physical magnitude reasoning.

Technology Category

Application Category

📝 Abstract

Guesstimation, the task of making approximate quantity estimates, is a common real-world challenge. However, it has been largely overlooked in large language models (LLMs) and vision language models (VLMs) research. We introduce a novel guesstimation dataset, MARBLES. This dataset requires one to estimate how many items (e.g., marbles) can fit into containers (e.g., a one-cup measuring cup), both with and without accompanying images. Inspired by the social science concept of the ``{Wisdom of Crowds'' (WOC) - taking the median from estimates from a crowd), which has proven effective in guesstimation, we propose ``WOC decoding'' strategy for LLM guesstimation. We show that LLMs/VLMs perform well on guesstimation, suggesting that they possess some level of a"world model"necessary for guesstimation. Moreover, similar to human performance, the WOC decoding method improves LLM/VLM guesstimation accuracy. Furthermore, the inclusion of images in the multimodal condition enhances model performance. These results highlight the value of WOC decoding strategy for LLMs/VLMs and position guesstimation as a probe for evaluating LLMs/VLMs' world model.

Problem

Research questions and friction points this paper is trying to address.

Improving guesstimation in LLMs using Wisdom of Crowds

Evaluating LLMs' world models through guesstimation tasks

Enhancing LLM accuracy with multimodal image data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces MARBLES dataset

Proposes WOC decoding strategy

Enhances accuracy with images

🔎 Similar Papers

Self-Alignment: Improving Alignment of Cultural Values in LLMs via In-Context Learning