From Ground Trust to Truth: Disparities in Offensive Language Judgments on Contemporary Korean Political Discourse

📅 2025-09-18

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing offensive language detection research suffers from reliance on outdated datasets and insufficient evaluation of generalization capability. To address this, this work focuses on contemporary Korean political discourse and introduces the first large-scale, newly annotated dataset for offensive language detection in this domain. We propose a three-paradigm pseudo-labeling framework that operates without ground-truth labels, integrating leave-one-out label consensus analysis with strategic single-shot prompting to enable lightweight, efficient modeling. Our approach innovatively uncovers systematic differences and aggregation patterns across distinct judgment paradigms. Empirical results demonstrate performance on par with high-resource supervised baselines, while achieving superior robustness and interpretability. This work establishes a reproducible, scalable paradigm for offensive language detection in low-resource, highly dynamic linguistic environments.

Technology Category

Application Category

📝 Abstract

Although offensive language continually evolves over time, even recent studies using LLMs have predominantly relied on outdated datasets and rarely evaluated the generalization ability on unseen texts. In this study, we constructed a large-scale dataset of contemporary political discourse and employed three refined judgments in the absence of ground truth. Each judgment reflects a representative offensive language detection method and is carefully designed for optimal conditions. We identified distinct patterns for each judgment and demonstrated tendencies of label agreement using a leave-one-out strategy. By establishing pseudo-labels as ground trust for quantitative performance assessment, we observed that a strategically designed single prompting achieves comparable performance to more resource-intensive methods. This suggests a feasible approach applicable in real-world settings with inherent constraints.

Problem

Research questions and friction points this paper is trying to address.

Detecting offensive language in contemporary Korean political discourse

Evaluating generalization of LLMs using up-to-date datasets

Assessing performance disparities across different detection methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contemporary political discourse dataset construction

Pseudo-labels as ground trust assessment

Strategic single prompting comparable performance

🔎 Similar Papers

No similar papers found.