PromptAug: Fine-grained Conflict Classification Using Data Augmentation

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the dual challenges of scarce high-quality labeled data and large language model (LLM) safety guardrails that impede the generation of offensive content in social media conflict behavior detection, this paper proposes PromptAug—a fine-grained data augmentation method integrating prompt engineering with social science paradigms. PromptAug circumvents LLM safety constraints to generate controllable, conflict-related textual instances and identifies four prototypical conflict patterns via thematic analysis. Comprehensive evaluation—including quantitative diversity metrics, extreme low-resource experiments, and qualitative analysis—demonstrates consistent improvements: +2% accuracy and +2% F1-score on both conflict and sentiment classification benchmarks. These gains significantly surpass those of existing augmentation methods, validating PromptAug’s effectiveness and generalizability for sensitive text classification and data-scarce scenarios.

Technology Category

Application Category

📝 Abstract
Given the rise of conflicts on social media, effective classification models to detect harmful behaviours are essential. Following the garbage-in-garbage-out maxim, machine learning performance depends heavily on training data quality. However, high-quality labelled data, especially for nuanced tasks like identifying conflict behaviours, is limited, expensive, and difficult to obtain. Additionally, as social media platforms increasingly restrict access to research data, text data augmentation is gaining attention as an alternative to generate training data. Augmenting conflict-related data poses unique challenges due to Large Language Model (LLM) guardrails that prevent generation of offensive content. This paper introduces PromptAug, an innovative LLM-based data augmentation method. PromptAug achieves statistically significant improvements of 2% in both accuracy and F1-score on conflict and emotion datasets. To thoroughly evaluate PromptAug against other data augmentation methods we conduct a robust evaluation using extreme data scarcity scenarios, quantitative diversity analysis and a qualitative thematic analysis. The thematic analysis identifies four problematic patterns in augmented text: Linguistic Fluidity, Humour Ambiguity, Augmented Content Ambiguity, and Augmented Content Misinterpretation. Overall, this work presents PromptAug as an effective method for augmenting data in sensitive tasks like conflict detection, offering a unique, interdisciplinary evaluation grounded in both natural language processing and social science methodology.
Problem

Research questions and friction points this paper is trying to address.

Limited high-quality labeled data for conflict behavior classification
Challenges in augmenting conflict data due to LLM guardrails
Need effective data augmentation for sensitive tasks like conflict detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based data augmentation method
Improves accuracy and F1-score
Evaluated with extreme data scarcity
🔎 Similar Papers
No similar papers found.
O
Oliver Warke
University of Glasgow, United Kingdom
J
Joemon M. Jose
University of Glasgow, United Kingdom
Faegheh Hasibi
Faegheh Hasibi
Assistant Professor, Radboud University
Information retrievalNatural language processingConversational AI
J
Jan Breitsohl
University of Glasgow, United Kingdom