🤖 AI Summary
This study exposes implicit identity bias and transparency deficits in chatbot-based recommendation systems: mainstream consumer-grade large language models (LLMs)—GPT-4, Claude, and Gemini—unintentionally generate stereotyped recommendations (p < 0.001) when providing personalized suggestions to U.S. users across four racial groups, conditioned on race explicitly declared or implicitly inferred from prompts—yet none disclose this influence in their outputs. Using multi-turn prompt engineering and rigorously controlled experiments, the work provides the first systematic empirical validation of identity dependence in LLM-driven recommendations. Key contributions are: (1) demonstrating statistically significant and pervasive effects of identity features on recommendation outcomes; (2) identifying a critical gap in bias explainability within current systems; and (3) proposing the design principle that “identity influence must be explicitly annotated,” thereby advancing both theoretical foundations and practical pathways toward fair, transparent AI recommendation systems.
📝 Abstract
While personalized recommendations are often desired by users, it can be difficult in practice to distinguish cases of bias from cases of personalization: we find that models generate racially stereotypical recommendations regardless of whether the user revealed their identity intentionally through explicit indications or unintentionally through implicit cues. We demonstrate that when people use large language models (LLMs) to generate recommendations, the LLMs produce responses that reflect both what the user wants and who the user is. We argue that chatbots ought to transparently indicate when recommendations are influenced by a user's revealed identity characteristics, but observe that they currently fail to do so. Our experiments show that even though a user's revealed identity significantly influences model recommendations (p<0.001), model responses obfuscate this fact in response to user queries. This bias and lack of transparency occurs consistently across multiple popular consumer LLMs and for four American racial groups.