Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This study systematically evaluates the efficacy and limitations of large language models (LLMs) as human-expert assistants in end-to-end event annotation workflows—specifically denoising, clustering, and variable labeling. Addressing the high cost of expert annotation and the low reliability of existing automated methods, we propose an LLM-augmented annotation framework and benchmark it against TF-IDF–based and event-set baselines. Results show that LLM assistance significantly reduces expert annotation time and cognitive load, while achieving higher variable adoption rates than fully automated approaches. However, LLMs’ standalone annotation accuracy remains substantially below human gold-standard performance, confirming they cannot replace domain experts. Our key contribution is the first empirical characterization of LLMs’ upper bound in full-pipeline event annotation: they enhance consistency and efficiency as collaborative tools, but do not supplant expert judgment.

Technology Category

Application Category

📝 Abstract

Event annotation is important for identifying market changes, monitoring breaking news, and understanding sociological trends. Although expert annotators set the gold standards, human coding is expensive and inefficient. Unlike information extraction experiments that focus on single contexts, we evaluate a holistic workflow that removes irrelevant documents, merges documents about the same event, and annotates the events. Although LLM-based automated annotations are better than traditional TF-IDF-based methods or Event Set Curation, they are still not reliable annotators compared to human experts. However, adding LLMs to assist experts for Event Set Curation can reduce the time and mental effort required for Variable Annotation. When using LLMs to extract event variables to assist expert annotators, they agree more with the extracted variables than fully automated LLMs for annotation.

Problem

Research questions and friction points this paper is trying to address.

Evaluates LLMs in event annotation workflows.

Compares LLM-assisted vs. fully automated annotation accuracy.

Proposes LLMs as tools to reduce expert annotation effort.

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs assist experts in event annotation

Holistic workflow removes irrelevant documents

LLMs reduce time and effort for experts

🔎 Similar Papers

Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models