Biases in LLM-Generated Musical Taste Profiles for Recommendation

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This study investigates whether natural language (NL) music taste profiles automatically generated by large language models (LLMs) are perceived as accurate by users, and whether systematic biases exist with respect to user attributes (e.g., mainstreamness, taste diversity) and item characteristics (e.g., genre, country of origin). Leveraging listening histories to generate NL profiles, we conduct a large-scale user survey and evaluate downstream recommendation performance to establish, for the first time, a joint analysis of profile endorsement and recommendation fairness. Results reveal that users with higher mainstreamness and lower taste diversity exhibit significantly greater endorsement; conversely, non-Western and niche-genre items substantially reduce endorsement—and this bias persists in recommendation accuracy and coverage. Our work uncovers implicit cultural and cognitive biases embedded in LLM-driven explainable recommendation systems, providing novel empirical evidence and a methodological framework for designing fairer, more trustworthy personalized recommender systems.

Technology Category

Application Category

📝 Abstract

One particularly promising use case of Large Language Models (LLMs) for recommendation is the automatic generation of Natural Language (NL) user taste profiles from consumption data. These profiles offer interpretable and editable alternatives to opaque collaborative filtering representations, enabling greater transparency and user control. However, it remains unclear whether users consider these profiles to be an accurate representation of their taste, which is crucial for trust and usability. Moreover, because LLMs inherit societal and data-driven biases, profile quality may systematically vary across user and item characteristics. In this paper, we study this issue in the context of music streaming, where personalization is challenged by a large and culturally diverse catalog. We conduct a user study in which participants rate NL profiles generated from their own listening histories. We analyze whether identification with the profiles is biased by user attributes (e.g., mainstreamness, taste diversity) and item features (e.g., genre, country of origin). We also compare these patterns to those observed when using the profiles in a downstream recommendation task. Our findings highlight both the potential and limitations of scrutable, LLM-based profiling in personalized systems.

Problem

Research questions and friction points this paper is trying to address.

Assessing accuracy of LLM-generated music taste profiles for users

Investigating biases in profiles across user and item attributes

Evaluating profile performance in downstream recommendation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates NL user taste profiles from data

Analyzes biases in profiles via user study

Compares profile biases in recommendation tasks

🔎 Similar Papers

No similar papers found.