Robust Persona-Aware Toxicity Detection with Prompt Optimization and Learned Ensembling

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Toxicity detection is inherently subjective due to varying perspectives across demographic groups, leading to unstable performance of existing large language model (LLM) prompting methods in multi-role scenarios. This work presents the first systematic evaluation of diverse role-aware prompting strategies and introduces a lightweight SVM-based meta-ensemble approach that effectively integrates complementary prediction errors by fusing prediction vectors from four role-conditioned prompts. By combining automated prompt optimization with LLM inference, the proposed method significantly outperforms both single-prompt strategies and conventional majority voting across diverse demographic groups, achieving state-of-the-art overall robustness and performance.

Technology Category

Application Category

📝 Abstract
Toxicity detection is inherently subjective, shaped by the diverse perspectives and social priors of different demographic groups. While ``pluralistic''modeling as used in economics and the social sciences aims to capture perspective differences across contexts, current Large Language Model (LLM) prompting techniques have different results across different personas and base models. In this work, we conduct a systematic evaluation of persona-aware toxicity detection, showing that no single prompting method, including our proposed automated prompt optimization strategy, uniformly dominates across all model-persona pairs. To exploit complementary errors, we explore ensembling four prompting variants and propose a lightweight meta-ensemble: an SVM over the 4-bit vector of prompt predictions. Our results demonstrate that the proposed SVM ensemble consistently outperforms individual prompting methods and traditional majority-voting techniques, achieving the strongest overall performance across diverse personas. This work provides one of the first systematic comparisons of persona-conditioned prompting for toxicity detection and offers a robust method for pluralistic evaluation in subjective NLP tasks.
Problem

Research questions and friction points this paper is trying to address.

toxicity detection
persona-aware
subjectivity
prompting
pluralistic evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

persona-aware toxicity detection
prompt optimization
learned ensembling
pluralistic NLP
SVM meta-ensemble
🔎 Similar Papers
No similar papers found.
B
Berk Atil
Pennsylvania State University
R
R. Passonneau
Pennsylvania State University
Ninareh Mehrabi
Ninareh Mehrabi
Amazon
AI SafetyResponsible AI