Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models

📅 2026-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how to balance psychological well-being and dignity when engaging members of stigmatized groups—specifically, 20 non-elite Korean university graduates—in red-teaming large language models using their lived experiences of discrimination. Through a participatory red-teaming approach, participants’ firsthand accounts were transformed into bias-detection strategies. Employing a mixed-methods design combining qualitative interviews and quantitative analysis, the research empirically demonstrates for the first time that this process entails both psychological burdens and empowering effects: while participants reported stress and identity-related distress, they also consistently developed a stronger sense of agency and subjectivity as guardians of ethical AI ecosystems. The work proposes a novel paradigm that integrates ethical care with technical evaluation, offering a methodological innovation for inclusive AI governance.

Technology Category

Application Category

📝 Abstract
Red-teaming, where adversarial prompts are crafted to expose harmful behaviors and assess risks, offers a dynamic approach to surfacing underlying stereotypical bias in large language models. Because such subtle harms are best recognized by those with lived experience, involving targets of stereotyping as red-teamers is essential. However, critical challenges remain in leveraging their lived experience for red-teaming while safeguarding psychological well-being. We conducted an empirical study of participatory red-teaming with 20 individuals stigmatized by stereotypes against nonprestigious college graduates in South Korea. Through mixed methods analysis, we found participants transformed experienced discrimination into strategic expertise for identifying biases, while facing psychological costs such as stress and negative reflections on group identity. Notably, red-team participation enhanced their sense of agency and empowerment through their role as guardians of the AI ecosystem. We discuss implications for designing participatory red-teaming that prioritizes both the ethical treatment and empowerment of stigmatized groups.
Problem

Research questions and friction points this paper is trying to address.

participatory red-teaming
stereotyping
large language models
psychological well-being
harmful behaviors
Innovation

Methods, ideas, or system contributions that make the work stand out.

participatory red-teaming
lived experience
stereotypical bias
large language models
ethical AI
🔎 Similar Papers
No similar papers found.
S
Sieun Kim
Department of Industrial Design, KAIST, Daejeon, Republic of Korea
Y
Yeeun Jo
Department of Education, Keimyung University, Daegu, Republic of Korea
S
Sungmin Na
Department of Industrial Design, KAIST, Daejeon, Republic of Korea
Hyunseung Lim
Hyunseung Lim
KAIST
Human-AI InteractionHuman-Computer Interaction
E
Eunchae Lee
Department of Industrial Design, KAIST, Daejeon, Republic of Korea
Y
Yu Min Choi
Department of Industrial Design, KAIST, Daejeon, Republic of Korea
S
Soohyun Cho
Department of Education, Keimyung University, Daegu, Republic of Korea
Hwajung Hong
Hwajung Hong
Associate Professor, KAIST
Human-Computer InteractionSocial ComputingDesign ResearchHealth Informatics