K/DA: Automated Data Generation Pipeline for Detoxifying Implicitly Offensive Language in Korean

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses two key challenges in Korean implicit offensive language detoxification: high human annotation costs and rapid lexical evolution causing dataset obsolescence. To this end, we propose the first automated paired-data generation framework specifically designed for dynamically evolving implicit offensiveness and trending slang. Our method integrates rule-enhanced template generation, trend-aware slang injection, semantic consistency constraints, and instruction-tuned detox model training. This approach significantly improves data timeliness, coverage of implicit toxicity, and cross-lingual generalization. Experiments demonstrate that the generated dataset outperforms existing Korean benchmarks in both implicit offense detection and neutral-to-toxic pair consistency. Moreover, lightweight instruction tuning alone achieves state-of-the-art detoxification performance. The framework establishes a novel paradigm for language safety governance in low-resource, highly dynamic linguistic environments.

Technology Category

Application Category

📝 Abstract

Language detoxification involves removing toxicity from offensive language. While a neutral-toxic paired dataset provides a straightforward approach for training detoxification models, creating such datasets presents several challenges: i) the need for human annotation to build paired data, and ii) the rapid evolution of offensive terms, rendering static datasets quickly outdated. To tackle these challenges, we introduce an automated paired data generation pipeline, called K/DA. This pipeline is designed to generate offensive language with implicit offensiveness and trend-aligned slang, making the resulting dataset suitable for detoxification model training. We demonstrate that the dataset generated by K/DA exhibits high pair consistency and greater implicit offensiveness compared to existing Korean datasets, and also demonstrates applicability to other languages. Furthermore, it enables effective training of a high-performing detoxification model with simple instruction fine-tuning.

Problem

Research questions and friction points this paper is trying to address.

Automated generation of offensive-neutral Korean language pairs

Addressing rapid obsolescence of static offensive language datasets

Enhancing detoxification models with implicit offensiveness and slang

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated offensive language generation pipeline

Trend-aligned slang for dynamic dataset updates

Simple instruction fine-tuning for detoxification

🔎 Similar Papers

No similar papers found.