LLM Benchmark-User Need Misalignment for Climate Change

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a significant misalignment between current large language model (LLM) evaluation benchmarks and users’ real-world knowledge needs regarding climate change. The authors propose an “active knowledge behavior” framework, introducing a Topic-Intent-Form triadic classification system to systematically analyze interaction patterns in both human–human and human–LLM climate knowledge exchanges through integrated qualitative and quantitative methods. The research reveals, for the first time, a structural discrepancy between existing benchmarks and actual user demands, while also demonstrating that LLM interaction patterns closely resemble those of human-to-human interactions. These findings offer actionable theoretical insights and concrete directions for improving future LLM evaluation benchmarks, optimizing retrieval-augmented generation (RAG) systems, and refining LLM training strategies.
📝 Abstract
Climate change is a major socio-scientific issue shapes public decision-making and policy discussions. As large language models (LLMs) increasingly serve as an interface for accessing climate knowledge, whether existing benchmarks reflect user needs is critical for evaluating LLM in real-world settings. We propose a Proactive Knowledge Behaviors Framework that captures the different human-human and human-AI knowledge seeking and provision behaviors. We further develop a Topic-Intent-Form taxonomy and apply it to analyze climate-related data representing different knowledge behaviors. Our results reveal a substantial mismatch between current benchmarks and real-world user needs, while knowledge interaction patterns between humans and LLMs closely resemble those in human-human interactions. These findings provide actionable guidance for benchmark design, RAG system development, and LLM training. Code is available at https://github.com/OuchengLiu/LLM-Misalign-Climate-Change.
Problem

Research questions and friction points this paper is trying to address.

LLM benchmark
user need misalignment
climate change
knowledge behavior
evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proactive Knowledge Behaviors Framework
Topic-Intent-Form taxonomy
LLM benchmark misalignment
climate change knowledge interaction
user need alignment
🔎 Similar Papers
No similar papers found.
O
Oucheng Liu
School of Computing, The Australian National University, Canberra, Australia
L
Lexing Xie
School of Computing, The Australian National University, Canberra, Australia
Jing Jiang
Jing Jiang
School of Computing, ANU; School of Computing and Information Systems, SMU
Natural Language ProcessingText MiningMachine Learning