COINBench: Moving Beyond Individual Perspectives to Collective Intent Understanding

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the limited capability of large language models (LLMs) in understanding collective intent—specifically, their difficulty in extracting consensus, resolving contradictions, and inferring latent trends from multi-source public discourse, despite strong performance on individual instructions. To bridge this gap, the authors introduce COINBench, the first dynamic evaluation benchmark for collective intent understanding, featuring a hierarchical cognitive architecture (COIN-TREE) and a retrieval-augmented verification mechanism (COIN-RAG), alongside a hybrid assessment framework combining rule-based metrics and LLM-as-Judge evaluation. Benchmarking 20 state-of-the-art LLMs reveals that current models largely succeed only at surface-level aggregation and struggle with deep synthesis of complex collective intentions, underscoring the need to evolve beyond mere instruction-following toward expert-level analytical agency.

Technology Category

Application Category

📝 Abstract

Understanding human intent is a high-level cognitive challenge for Large Language Models (LLMs), requiring sophisticated reasoning over noisy, conflicting, and non-linear discourse. While LLMs excel at following individual instructions, their ability to distill Collective Intent - the process of extracting consensus, resolving contradictions, and inferring latent trends from multi-source public discussions - remains largely unexplored. To bridge this gap, we introduce COIN-BENCH, a dynamic, real-world, live-updating benchmark specifically designed to evaluate LLMs on collective intent understanding within the consumer domain. Unlike traditional benchmarks that focus on transactional outcomes, COIN-BENCH operationalizes intent as a hierarchical cognitive structure, ranging from explicit scenarios to deep causal reasoning. We implement a robust evaluation pipeline that combines a rule-based method with an LLM-as-the-Judge approach. This framework incorporates COIN-TREE for hierarchical cognitive structuring and retrieval-augmented verification (COIN-RAG) to ensure expert-level precision in analyzing raw, collective human discussions. An extensive evaluation of 20 state-of-the-art LLMs across four dimensions - depth, breadth, informativeness, and correctness - reveals that while current models can handle surface-level aggregation, they still struggle with the analytical depth required for complex intent synthesis. COIN-BENCH establishes a new standard for advancing LLMs from passive instruction followers to expert-level analytical agents capable of deciphering the collective voice of the real world. See our project page on COIN-BENCH.

Problem

Research questions and friction points this paper is trying to address.

Collective Intent

Large Language Models

Intent Understanding

Public Discourse

Cognitive Reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collective Intent

COIN-BENCH

Hierarchical Cognitive Structuring