WildClaims: Information Access Conversations in the Wild(Chat)

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

In real-world dialogues, users frequently express implicit information retrieval needs, yet existing research predominantly focuses on explicit queries and overlooks the prevalence of implicit factual claims. Method: This paper formally defines the phenomenon of “implicit information access” and introduces WildClaims—a large-scale dataset constructed from WildChat, a corpus of user-ChatGPT conversations—via a hybrid approach combining human annotation and automated extraction to systematically identify verifiable factual assertions. Contribution/Results: WildClaims contains 121,000 annotated claims; empirical analysis reveals that 18%–76% of dialogues contain such implicit claims, underscoring their ubiquity and significance. As the first benchmark grounded in authentic conversational data, WildClaims enables rigorous evaluation of dialogue systems’ factual reliability and establishes a foundation for systematic research on implicit information access.

Technology Category

Application Category

📝 Abstract

The rapid advancement of Large Language Models (LLMs) has transformed conversational systems into practical tools used by millions. However, the nature and necessity of information retrieval in real-world conversations remain largely unexplored, as research has focused predominantly on traditional, explicit information access conversations. The central question is: What do real-world information access conversations look like? To this end, we first conduct an observational study on the WildChat dataset, large-scale user-ChatGPT conversations, finding that users' access to information occurs implicitly as check-worthy factual assertions made by the system, even when the conversation's primary intent is non-informational, such as creative writing. To enable the systematic study of this phenomenon, we release the WildClaims dataset, a novel resource consisting of 121,905 extracted factual claims from 7,587 utterances in 3,000 WildChat conversations, each annotated for check-worthiness. Our preliminary analysis of this resource reveals that conservatively 18% to 51% of conversations contain check-worthy assertions, depending on the methods employed, and less conservatively, as many as 76% may contain such assertions. This high prevalence underscores the importance of moving beyond the traditional understanding of explicit information access, to address the implicit information access that arises in real-world user-system conversations.

Problem

Research questions and friction points this paper is trying to address.

Understanding real-world information access conversations beyond traditional explicit formats

Analyzing implicit factual assertions in user-ChatGPT conversations for check-worthiness

Addressing high prevalence of check-worthy claims in non-informational conversation contexts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed real-world conversations from WildChat dataset

Created WildClaims dataset with annotated factual claims

Identified high prevalence of implicit information access

🔎 Similar Papers

No similar papers found.