π€ AI Summary
Single-response interfaces for large language models (LLMs) often foster user overtrust and anthropomorphism, undermining appropriate reliance on their probabilistic, non-deterministic nature.
Method: We propose a cognitive support paradigm integrating multi-response visualization with semantic similarity annotation. Responses are generated via LLM sampling; pairwise semantic and structural similarities are computed automatically. A within-subjects experiment (N=XX) evaluated the design using NASA-TLX, trust/dependence, and anthropomorphism scales across three conditions.
Contribution/Results: This is the first systematic empirical validation demonstrating that the paradigm simultaneously calibrates user trust and suppresses anthropomorphism. Results show significantly improved awareness of LLMsβ probabilistic behavior, reduced inappropriate trust and anthropomorphic attribution, and no increase in subjective cognitive load. The work establishes a novel, empirically grounded paradigm for explainable humanβLLM interaction under uncertainty.
π Abstract
Interfaces for interacting with large language models (LLMs) are often designed to mimic human conversations, typically presenting a single response to user queries. This design choice can obscure the probabilistic and predictive nature of these models, potentially fostering undue trust and over-anthropomorphization of the underlying model. In this paper, we investigate (i) the effect of displaying multiple responses simultaneously as a countermeasure to these issues, and (ii) how a cognitive support mechanism-highlighting structural and semantic similarities across responses-helps users deal with the increased cognitive load of that intervention. We conducted a within-subjects study in which participants inspected responses generated by an LLM under three conditions: one response, ten responses with cognitive support, and ten responses without cognitive support. Participants then answered questions about workload, trust and reliance, and anthropomorphization. We conclude by reporting the results of these studies and discussing future work and design opportunities for future LLM interfaces.