The Impact of Revealing Large Language Model Stochasticity on Trust, Reliability, and Anthropomorphization

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Single-response interfaces for large language models (LLMs) often foster user overtrust and anthropomorphism, undermining appropriate reliance on their probabilistic, non-deterministic nature. Method: We propose a cognitive support paradigm integrating multi-response visualization with semantic similarity annotation. Responses are generated via LLM sampling; pairwise semantic and structural similarities are computed automatically. A within-subjects experiment (N=XX) evaluated the design using NASA-TLX, trust/dependence, and anthropomorphism scales across three conditions. Contribution/Results: This is the first systematic empirical validation demonstrating that the paradigm simultaneously calibrates user trust and suppresses anthropomorphism. Results show significantly improved awareness of LLMs’ probabilistic behavior, reduced inappropriate trust and anthropomorphic attribution, and no increase in subjective cognitive load. The work establishes a novel, empirically grounded paradigm for explainable human–LLM interaction under uncertainty.

Technology Category

Application Category

📝 Abstract

Interfaces for interacting with large language models (LLMs) are often designed to mimic human conversations, typically presenting a single response to user queries. This design choice can obscure the probabilistic and predictive nature of these models, potentially fostering undue trust and over-anthropomorphization of the underlying model. In this paper, we investigate (i) the effect of displaying multiple responses simultaneously as a countermeasure to these issues, and (ii) how a cognitive support mechanism-highlighting structural and semantic similarities across responses-helps users deal with the increased cognitive load of that intervention. We conducted a within-subjects study in which participants inspected responses generated by an LLM under three conditions: one response, ten responses with cognitive support, and ten responses without cognitive support. Participants then answered questions about workload, trust and reliance, and anthropomorphization. We conclude by reporting the results of these studies and discussing future work and design opportunities for future LLM interfaces.

Problem

Research questions and friction points this paper is trying to address.

Impact of LLM stochasticity on trust and anthropomorphization

Effect of displaying multiple LLM responses simultaneously

Cognitive support for managing increased response complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Display multiple LLM responses simultaneously

Highlight structural and semantic similarities

Measure workload, trust, and anthropomorphization effects

🔎 Similar Papers

No similar papers found.