🤖 AI Summary
Existing LLM bias evaluation methods rely on predefined identity-concept templates, limiting their ability to detect unknown or implicit social biases in open-ended generation. To address this, we propose the Bias Association Discovery Framework (BADF), the first systematic framework enabling automatic discovery of previously unknown bias associations from unconstrained model outputs. BADF integrates prompt engineering, semantic clustering, quantitative association strength measurement, and cross-model validation to achieve end-to-end, interpretable bias pattern identification. Extensive experiments across mainstream large language models and diverse real-world scenarios demonstrate that BADF not only recovers well-documented biases—such as gender-occupation stereotypes—but also uncovers novel, previously unreported associations—e.g., region-morality directional biases. The framework’s implementation, along with annotated datasets and evaluation scripts, is publicly released to foster reproducible bias research.
📝 Abstract
Social biases embedded in Large Language Models (LLMs) raise critical concerns, resulting in representational harms -- unfair or distorted portrayals of demographic groups -- that may be expressed in subtle ways through generated language. Existing evaluation methods often depend on predefined identity-concept associations, limiting their ability to surface new or unexpected forms of bias. In this work, we present the Bias Association Discovery Framework (BADF), a systematic approach for extracting both known and previously unrecognized associations between demographic identities and descriptive concepts from open-ended LLM outputs. Through comprehensive experiments spanning multiple models and diverse real-world contexts, BADF enables robust mapping and analysis of the varied concepts that characterize demographic identities. Our findings advance the understanding of biases in open-ended generation and provide a scalable tool for identifying and analyzing bias associations in LLMs. Data, code, and results are available at https://github.com/JP-25/Discover-Open-Ended-Generation