COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited capability of large language models (LLMs) in understanding collective intent—specifically, their difficulty in extracting consensus, resolving contradictions, and inferring latent trends from multi-source public discourse, despite strong performance on individual instructions. To bridge this gap, the authors introduce COINBench, the first dynamic evaluation benchmark for collective intent understanding, featuring a hierarchical cognitive architecture (COIN-TREE) and a retrieval-augmented verification mechanism (COIN-RAG), alongside a hybrid assessment framework combining rule-based metrics and LLM-as-Judge evaluation. Benchmarking 20 state-of-the-art LLMs reveals that current models largely succeed only at surface-level aggregation and struggle with deep synthesis of complex collective intentions, underscoring the need to evolve beyond mere instruction-following toward expert-level analytical agency.

Technology Category

Application Category

📝 Abstract
Understanding human intent is a high-level cognitive challenge for Large Language Models (LLMs), requiring sophisticated reasoning over noisy, conflicting, and non-linear discourse. While LLMs excel at following individual instructions, their ability to distill Collective Intent - the process of extracting consensus, resolving contradictions, and inferring latent trends from multi-source public discussions - remains largely unexplored. To bridge this gap, we introduce COIN-BENCH, a dynamic, real-world, live-updating benchmark specifically designed to evaluate LLMs on collective intent understanding within the consumer domain. Unlike traditional benchmarks that focus on transactional outcomes, COIN-BENCH operationalizes intent as a hierarchical cognitive structure, ranging from explicit scenarios to deep causal reasoning. We implement a robust evaluation pipeline that combines a rule-based method with an LLM-as-the-Judge approach. This framework incorporates COIN-TREE for hierarchical cognitive structuring and retrieval-augmented verification (COIN-RAG) to ensure expert-level precision in analyzing raw, collective human discussions. An extensive evaluation of 20 state-of-the-art LLMs across four dimensions - depth, breadth, informativeness, and correctness - reveals that while current models can handle surface-level aggregation, they still struggle with the analytical depth required for complex intent synthesis. COIN-BENCH establishes a new standard for advancing LLMs from passive instruction followers to expert-level analytical agents capable of deciphering the collective voice of the real world. See our project page on COIN-BENCH.
Problem

Research questions and friction points this paper is trying to address.

Collective Intent
Large Language Models
Intent Understanding
Public Discourse
Cognitive Reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collective Intent
COIN-BENCH
Hierarchical Cognitive Structuring
Retrieval-Augmented Verification
LLM-as-Judge
🔎 Similar Papers
No similar papers found.
X
Xiaozhe Li
Tongji University
T
Tianyi Lyu
Tongji University
S
Siyi Yang
Tongji University
Y
Yizhao Yang
Tongji University
Y
Yuxi Gong
Tongji University
J
Jinxuan Huang
Tongji University
L
Ligao Zhang
CurrentsAI Research
Z
Zhuoyi Huang
Stanford University, CurrentsAI Research
Qingwen Liu
Qingwen Liu
Tongji University
Wireless NetworkingAI