Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models

📅 2026-02-22

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This study investigates how to balance psychological well-being and dignity when engaging members of stigmatized groups—specifically, 20 non-elite Korean university graduates—in red-teaming large language models using their lived experiences of discrimination. Through a participatory red-teaming approach, participants’ firsthand accounts were transformed into bias-detection strategies. Employing a mixed-methods design combining qualitative interviews and quantitative analysis, the research empirically demonstrates for the first time that this process entails both psychological burdens and empowering effects: while participants reported stress and identity-related distress, they also consistently developed a stronger sense of agency and subjectivity as guardians of ethical AI ecosystems. The work proposes a novel paradigm that integrates ethical care with technical evaluation, offering a methodological innovation for inclusive AI governance.

Technology Category

Application Category

📝 Abstract

Red-teaming, where adversarial prompts are crafted to expose harmful behaviors and assess risks, offers a dynamic approach to surfacing underlying stereotypical bias in large language models. Because such subtle harms are best recognized by those with lived experience, involving targets of stereotyping as red-teamers is essential. However, critical challenges remain in leveraging their lived experience for red-teaming while safeguarding psychological well-being. We conducted an empirical study of participatory red-teaming with 20 individuals stigmatized by stereotypes against nonprestigious college graduates in South Korea. Through mixed methods analysis, we found participants transformed experienced discrimination into strategic expertise for identifying biases, while facing psychological costs such as stress and negative reflections on group identity. Notably, red-team participation enhanced their sense of agency and empowerment through their role as guardians of the AI ecosystem. We discuss implications for designing participatory red-teaming that prioritizes both the ethical treatment and empowerment of stigmatized groups.

Problem

Research questions and friction points this paper is trying to address.

participatory red-teaming

stereotyping

large language models

psychological well-being

harmful behaviors

Innovation

Methods, ideas, or system contributions that make the work stand out.

participatory red-teaming

lived experience

stereotypical bias