🤖 AI Summary
This study addresses the risk that large language models (LLMs) may introduce racial and gender biases when generating summaries of human life narratives, potentially leading to misinterpretation and representational harm. To tackle this issue, the authors propose the first systematic analytical framework that integrates inductive thematic summarization, manual annotation, and bias detection methods, collaboratively evaluated with psychologists to assess LLMs’ situated biases in abstract textual interpretation. The proposed pipeline effectively identifies systematic racial and gender biases embedded in LLM-generated summaries, demonstrating its feasibility for uncovering representational risks. The work advocates for integrating such bias-aware analytical approaches into standard practices for qualitative research involving LLMs.
📝 Abstract
Increasingly, studies are exploring using Large Language Models (LLMs) for accelerated or scaled qualitative analysis of text data. While we can compare LLM accuracy against human labels directly for deductive coding, or labeling text, it is more challenging to judge the ethics and effectiveness of using LLMs in abstractive methods such as inductive thematic analysis. We collaborate with psychologists to study the abstractive claims LLMs make about human life stories, asking, how does using an LLM as an interpreter of meaning affect the conclusions and perspectives of a study? We propose a summarization-based pipeline for surfacing biases in perspective-taking an LLM might employ in interpreting these life stories. We demonstrate that our pipeline can identify both race and gender bias with the potential for representational harm. Finally, we encourage the use of this analysis in future studies involving LLM-based interpretation of study participants' written text or transcribed speech to characterize a positionality portrait for the study.