Annotation Quality in Aspect-Based Sentiment Analysis: A Case Study Comparing Experts, Students, Crowdworkers, and Large Language Model

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This study addresses the scarcity of high-quality annotated data for German aspect-based sentiment analysis (ABSA) and the unclear impact of annotation sources on model performance. It presents the first systematic comparison of annotation quality among experts, students, crowdworkers, and large language models (LLMs) in the German ABSA context. The authors construct a gold-standard dataset through expert re-annotation and evaluate the effectiveness of each annotation type on two core tasks: aspect category sentiment analysis (ACSA) and aspect term and sentiment detection (TASD). Leveraging state-of-the-art models—including BERT, T5, and LLaMA—with both fine-tuning and instruction-based prompting, the experiments demonstrate that expert annotations yield significantly higher consistency and downstream task performance. The study also quantifies the trade-offs of using LLM-generated and non-expert annotations under resource-constrained conditions, highlighting their practical feasibility alongside inherent limitations.

📝 Abstract

Aspect-Based Sentiment Analysis (ABSA) enables fine-grained opinion analysis by identifying sentiments toward specific aspects or targets within a text. While ABSA has been widely studied for English, research on other languages such as German remains limited, largely due to the lack of high-quality annotated datasets. This paper examines how different annotation sources influence the development of German ABSA. To this end, an existing dataset is re-annotated by experts to establish a ground truth, which serves as a reference for evaluating annotations produced by students, crowdworkers, Large Language Models (LLMs), and experts. Annotation quality is compared using Inter-Annotator Agreement (IAA) and its impact on downstream model performance for different ABSA subtasks. The evaluation focuses on Aspect Category Sentiment Analysis (ACSA) and Target Aspect Sentiment Detection (TASD). We apply State-of-the-Art (SOTA) methods for ABSA, including BERT-, T5-, and LLaMA-based approaches to assess performance differences, spanning fine-tuning and in-context learning with instruction prompts. The findings provide practical insights into trade-offs between annotation reliability and efficiency, offering guidance for dataset construction in under-resourced Natural Language Processing (NLP) scenarios.

Problem

Research questions and friction points this paper is trying to address.

Aspect-Based Sentiment Analysis

Annotation Quality

German NLP

Inter-Annotator Agreement

Under-resourced Languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aspect-Based Sentiment Analysis

Annotation Quality

Large Language Models