Probing LLM World Models: Enhancing Guesstimation with Wisdom of Crowds Decoding

📅 2025-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low accuracy and lack of interpretable world-modeling capability of large language models (LLMs) and vision-language models (VLMs) on guesstimation—approximate quantitative estimation—tasks. To mitigate these limitations, we propose a Wisdom of Crowds (WOC)-based decoding strategy: generating diverse estimates via multi-round sampling and aggregating them via median selection to enhance robustness. We introduce MARBLES, the first multimodal (image-text) guesstimation benchmark, and systematically integrate WOC into LLM/VLM decoding for the first time. Our findings are threefold: (1) WOC substantially improves estimation accuracy across models; (2) guesstimation serves as an effective probe for assessing implicit world-modeling competence; and (3) incorporating visual input further boosts performance, demonstrating the critical role of multimodal synergy in physical magnitude reasoning.

Technology Category

Application Category

📝 Abstract
Guesstimation, the task of making approximate quantity estimates, is a common real-world challenge. However, it has been largely overlooked in large language models (LLMs) and vision language models (VLMs) research. We introduce a novel guesstimation dataset, MARBLES. This dataset requires one to estimate how many items (e.g., marbles) can fit into containers (e.g., a one-cup measuring cup), both with and without accompanying images. Inspired by the social science concept of the ``{Wisdom of Crowds'' (WOC) - taking the median from estimates from a crowd), which has proven effective in guesstimation, we propose ``WOC decoding'' strategy for LLM guesstimation. We show that LLMs/VLMs perform well on guesstimation, suggesting that they possess some level of a"world model"necessary for guesstimation. Moreover, similar to human performance, the WOC decoding method improves LLM/VLM guesstimation accuracy. Furthermore, the inclusion of images in the multimodal condition enhances model performance. These results highlight the value of WOC decoding strategy for LLMs/VLMs and position guesstimation as a probe for evaluating LLMs/VLMs' world model.
Problem

Research questions and friction points this paper is trying to address.

Improving guesstimation in LLMs using Wisdom of Crowds
Evaluating LLMs' world models through guesstimation tasks
Enhancing LLM accuracy with multimodal image data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces MARBLES dataset
Proposes WOC decoding strategy
Enhances accuracy with images
🔎 Similar Papers
No similar papers found.
Y
Yun-Shiuan Chuang
University of Wisconsin-Madison, PayPal Inc.
N
Nikunj Harlalka
University of Wisconsin-Madison
S
Sameer Narendran
University of Wisconsin-Madison
A
Alexander Cheung
University of Wisconsin-Madison
S
Sizhe Gao
University of Wisconsin-Madison
Siddharth Suresh
Siddharth Suresh
Graduate Student, University of Wisconsin, Madison
Cognitive ScienceHuman-AI alignmentLarge Language ModelsRepresentation Learning
J
Junjie Hu
University of Wisconsin-Madison
Timothy T. Rogers
Timothy T. Rogers
Professor of Psychology, University of Wisconsin-Madison
Cognitive neurosciencememorylanguagecategorizationconceptual knowledge