FairEval: Evaluating Fairness in LLM-Based Recommendations with Personality Awareness

📅 2025-04-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing fairness evaluation methods for LLM-based recommender systems inadequately address dual-dimensional bias—across both psychological traits (e.g., Big Five personality dimensions) and sensitive demographic attributes (e.g., gender, race, age; eight categories total). Method: We propose PAFS (Personality-Aware Fairness Score), the first user-level fairness assessment framework integrating psychological and sociodemographic attributes. It introduces Big Five personality traits into fairness modeling, leverages prompt engineering with ChatGPT-4o and Gemini-1.5-Flash, conducts controlled comparative recommendation experiments, and combines statistical bias analysis with personality embedding modeling for fine-grained bias quantification. Contribution/Results: PAFS achieves high reliability (0.9969–0.9997) and detects up to 34.79% inter-group recommendation disparity—significantly surpassing conventional demographic-only approaches. Results empirically validate that prompt design critically influences fairness outcomes, establishing a novel paradigm beyond traditional demographic-centric evaluation.

Technology Category

Application Category

📝 Abstract
Recent advances in Large Language Models (LLMs) have enabled their application to recommender systems (RecLLMs), yet concerns remain regarding fairness across demographic and psychological user dimensions. We introduce FairEval, a novel evaluation framework to systematically assess fairness in LLM-based recommendations. FairEval integrates personality traits with eight sensitive demographic attributes,including gender, race, and age, enabling a comprehensive assessment of user-level bias. We evaluate models, including ChatGPT 4o and Gemini 1.5 Flash, on music and movie recommendations. FairEval's fairness metric, PAFS, achieves scores up to 0.9969 for ChatGPT 4o and 0.9997 for Gemini 1.5 Flash, with disparities reaching 34.79 percent. These results highlight the importance of robustness in prompt sensitivity and support more inclusive recommendation systems.
Problem

Research questions and friction points this paper is trying to address.

Assessing fairness in LLM-based recommendations across demographics
Integrating personality traits with sensitive attributes for bias evaluation
Evaluating model robustness in prompt sensitivity for inclusive systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

FairEval framework assesses fairness in LLM recommendations
Integrates personality traits with demographic attributes
PAFS metric evaluates model fairness effectively
🔎 Similar Papers
Chandan Kumar Sah
Chandan Kumar Sah
MTech. (Research) student at Indian Institute of Science
Data-driven controlKoopman Operator TheoryMulti-agent systemsReinforcement learning
X
Xiaoli Lian
School of Computer Science and Engineering, Beihang University, Beijing, China
Tony Xu
Tony Xu
University of Toronto
Computer VisionMedical ImagingDeep Learning
L
Li Zhang
School of Computer Science and Engineering, Beihang University, Beijing, China