FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Existing fairness benchmarks fail to expose the bias vulnerability of large language models (LLMs) under extreme adversarial conditions. Method: We propose FLEX, the first rigorous fairness evaluation framework that integrates bias-amplifying prompt systems into fairness assessment. It combines adversarial prompt engineering, multidimensional fairness metrics (spanning gender, race, geography, etc.), and cross-model consistency analysis. Results: Experiments show that mainstream LLMs exhibit significantly degraded fairness performance on FLEX, while conventional benchmarks underestimate their bias risk by an average of 42%. This work bridges a critical gap in robustness evaluation under implicit bias induction and advances the development of more stringent, real-world threat-aware fairness evaluation standards for LLMs.

Technology Category

Application Category

📝 Abstract

Recent advancements in Large Language Models (LLMs) have significantly enhanced interactions between users and models. These advancements concurrently underscore the need for rigorous safety evaluations due to the manifestation of social biases, which can lead to harmful societal impacts. Despite these concerns, existing benchmarks may overlook the intrinsic weaknesses of LLMs, which can generate biased responses even with simple adversarial instructions. To address this critical gap, we introduce a new benchmark, Fairness Benchmark in LLM under Extreme Scenarios (FLEX), designed to test whether LLMs can sustain fairness even when exposed to prompts constructed to induce bias. To thoroughly evaluate the robustness of LLMs, we integrate prompts that amplify potential biases into the fairness assessment. Comparative experiments between FLEX and existing benchmarks demonstrate that traditional evaluations may underestimate the inherent risks in models. This highlights the need for more stringent LLM evaluation benchmarks to guarantee safety and fairness.

Problem

Research questions and friction points this paper is trying to address.

Evaluating robustness of fairness in large language models

Assessing bias risks under adversarial prompts

Developing stringent benchmarks for model safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces FLEX benchmark for extreme fairness testing

Integrates bias-amplifying prompts in evaluations

Demonstrates underestimation of risks in traditional benchmarks

🔎 Similar Papers

No similar papers found.