From Chat Logs to Collective Insights: Aggregative Question Answering

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Extracting macro-level collective insights—such as emerging concerns and population-wide trends—from massive, heterogeneous user-chatbot dialogue logs remains a fundamental challenge due to the need for cross-dialogue, large-scale aggregation and reasoning. Method: We introduce “aggregated question answering” (AQA), a novel task requiring models to jointly reason over thousands of dialogues; propose WildChat-AQA—the first real-world-driven benchmark comprising 6,027 aggregated questions grounded in 182,330 dialogues—and develop a LLM-based framework featuring cross-dialogue attention, hierarchical sampling with summary distillation, and population-level semantic alignment. Contribution/Results: Experiments reveal that state-of-the-art methods achieve only <32% average accuracy on WildChat-AQA, with near-total failure on long-tail questions—exposing critical limitations in scalability and collective reasoning. This work establishes both a foundational benchmark and a methodological framework for scalable, efficient collective insight extraction.

Technology Category

Application Category

📝 Abstract

Conversational agents powered by large language models (LLMs) are rapidly becoming integral to our daily interactions, generating unprecedented amounts of conversational data. Such datasets offer a powerful lens into societal interests, trending topics, and collective concerns. Yet, existing approaches typically treat these interactions as independent and miss critical insights that could emerge from aggregating and reasoning across large-scale conversation logs. In this paper, we introduce Aggregative Question Answering, a novel task requiring models to reason explicitly over thousands of user-chatbot interactions to answer aggregative queries, such as identifying emerging concerns among specific demographics. To enable research in this direction, we construct a benchmark, WildChat-AQA, comprising 6,027 aggregative questions derived from 182,330 real-world chatbot conversations. Experiments show that existing methods either struggle to reason effectively or incur prohibitive computational costs, underscoring the need for new approaches capable of extracting collective insights from large-scale conversational data.

Problem

Research questions and friction points this paper is trying to address.

Extracting collective insights from large-scale conversational data

Answering aggregative queries across thousands of user-chatbot interactions

Overcoming computational challenges in reasoning over chat logs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregative reasoning over large-scale chat logs

Novel task: Aggregative Question Answering

Benchmark WildChat-AQA with 6,027 questions

🔎 Similar Papers

No similar papers found.