"All of Me": Mining Users' Attributes from their Public Spotify Playlists

📅 2024-01-25
🏛️ The Web Conference
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether users’ private attributes—such as age, gender, and personality traits—can be inferred from their publicly available Spotify playlists. To this end, we construct the first large-scale, multi-attribute-aligned dataset comprising 10,286 real-world playlists contributed by 739 users. We propose an interpretable, cross-level feature modeling framework that integrates song-level audio features, artist-level metadata, and playlist-level social tags, combined with XGBoost and BERT-based multimodal embeddings. Our work provides the first systematic empirical validation of strong statistical associations between musical preferences and multidimensional user attributes, formalizing the computationally tractable principle that “musical expression reflects identity expression.” Experimental results demonstrate state-of-the-art performance: AUC scores reach up to 0.89 on prediction tasks for age, gender, and openness—significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract
In the age of digital music streaming, playlists on platforms like Spotify have become an integral part of individuals' musical experiences. People create and publicly share their own playlists to express their musical tastes, promote the discovery of their favorite artists, and foster social connections. In this work, we aim to address the question: can we infer users' private attributes from their public Spotify playlists? To this end, we conducted an online survey involving 739 Spotify users, resulting in a dataset of 10,286 publicly shared playlists comprising over 200,000 unique songs and 55,000 artists. Then, we utilize statistical analyses and machine learning algorithms to build accurate predictive models for users' attributes.
Problem

Research questions and friction points this paper is trying to address.

Infer users' private attributes from public Spotify playlists.
Use statistical analyses and machine learning for prediction.
Address privacy concerns in digital music streaming data.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using statistical analyses on playlist data
Applying machine learning for attribute prediction
Mining user attributes from public Spotify playlists
🔎 Similar Papers
No similar papers found.