FairEval: Evaluating Fairness in LLM-Based Recommendations with Personality Awareness

📅 2025-04-10

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing fairness evaluation methods for LLM-based recommender systems inadequately address dual-dimensional bias—across both psychological traits (e.g., Big Five personality dimensions) and sensitive demographic attributes (e.g., gender, race, age; eight categories total). Method: We propose PAFS (Personality-Aware Fairness Score), the first user-level fairness assessment framework integrating psychological and sociodemographic attributes. It introduces Big Five personality traits into fairness modeling, leverages prompt engineering with ChatGPT-4o and Gemini-1.5-Flash, conducts controlled comparative recommendation experiments, and combines statistical bias analysis with personality embedding modeling for fine-grained bias quantification. Contribution/Results: PAFS achieves high reliability (0.9969–0.9997) and detects up to 34.79% inter-group recommendation disparity—significantly surpassing conventional demographic-only approaches. Results empirically validate that prompt design critically influences fairness outcomes, establishing a novel paradigm beyond traditional demographic-centric evaluation.

Technology Category

Application Category

📝 Abstract

Recent advances in Large Language Models (LLMs) have enabled their application to recommender systems (RecLLMs), yet concerns remain regarding fairness across demographic and psychological user dimensions. We introduce FairEval, a novel evaluation framework to systematically assess fairness in LLM-based recommendations. FairEval integrates personality traits with eight sensitive demographic attributes,including gender, race, and age, enabling a comprehensive assessment of user-level bias. We evaluate models, including ChatGPT 4o and Gemini 1.5 Flash, on music and movie recommendations. FairEval's fairness metric, PAFS, achieves scores up to 0.9969 for ChatGPT 4o and 0.9997 for Gemini 1.5 Flash, with disparities reaching 34.79 percent. These results highlight the importance of robustness in prompt sensitivity and support more inclusive recommendation systems.

Problem

Research questions and friction points this paper is trying to address.

Assessing fairness in LLM-based recommendations across demographics

Integrating personality traits with sensitive attributes for bias evaluation

Evaluating model robustness in prompt sensitivity for inclusive systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

FairEval framework assesses fairness in LLM recommendations

Integrates personality traits with demographic attributes

PAFS metric evaluates model fairness effectively

🔎 Similar Papers

Stereotype or Personalization? User Identity Biases Chatbot Recommendations