Label-Consistent Data Generation for Aspect-Based Sentiment Analysis Using LLM Agents

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the scarcity of high-quality annotated data in aspect-based sentiment analysis (ABSA), particularly the challenge of ensuring label consistency in aspect term generation. To tackle this issue, the authors propose an iterative generation-and-verification framework powered by large language model (LLM) agents, which employs a closed-loop process of synthetic data creation and consistency validation to significantly enhance label fidelity. By integrating iterative prompting with verification strategies, the approach demonstrates consistent effectiveness across multiple ABSA subtasks and benchmark datasets. Experimental results show that the agent-augmented synthetic data achieves higher label retention rates compared to conventional prompting methods. Furthermore, when combined with real annotated data for training, the T5-Base model exhibits substantially improved performance, matching that of considerably larger pretrained models.

Technology Category

Application Category

📝 Abstract

We propose an agentic data augmentation method for Aspect-Based Sentiment Analysis (ABSA) that uses iterative generation and verification to produce high quality synthetic training examples. To isolate the effect of agentic structure, we also develop a closely matched prompting-based baseline using the same model and instructions. Both methods are evaluated across three ABSA subtasks (Aspect Term Extraction (ATE), Aspect Sentiment Classification (ATSC), and Aspect Sentiment Pair Extraction (ASPE)), four SemEval datasets, and two encoder-decoder models: T5-Base and Tk-Instruct. Our results show that the agentic augmentation outperforms raw prompting in label preservation of the augmented data, especially when the tasks require aspect term generation. In addition, when combined with real data, agentic augmentation provides higher gains, consistently outperforming prompting-based generation. These benefits are most pronounced for T5-Base, while the more heavily pretrained Tk-Instruct exhibits smaller improvements. As a result, augmented data helps T5-Base achieve comparable performance with its counterpart.

Problem

Research questions and friction points this paper is trying to address.

Aspect-Based Sentiment Analysis

Data Augmentation

Label Consistency

Synthetic Data Generation

Aspect Term Extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic data augmentation

label-consistent generation

aspect-based sentiment analysis