🤖 AI Summary
This study investigates whether users’ private attributes—such as age, gender, and personality traits—can be inferred from their publicly available Spotify playlists. To this end, we construct the first large-scale, multi-attribute-aligned dataset comprising 10,286 real-world playlists contributed by 739 users. We propose an interpretable, cross-level feature modeling framework that integrates song-level audio features, artist-level metadata, and playlist-level social tags, combined with XGBoost and BERT-based multimodal embeddings. Our work provides the first systematic empirical validation of strong statistical associations between musical preferences and multidimensional user attributes, formalizing the computationally tractable principle that “musical expression reflects identity expression.” Experimental results demonstrate state-of-the-art performance: AUC scores reach up to 0.89 on prediction tasks for age, gender, and openness—significantly outperforming existing baselines.
📝 Abstract
In the age of digital music streaming, playlists on platforms like Spotify have become an integral part of individuals' musical experiences. People create and publicly share their own playlists to express their musical tastes, promote the discovery of their favorite artists, and foster social connections. In this work, we aim to address the question: can we infer users' private attributes from their public Spotify playlists? To this end, we conducted an online survey involving 739 Spotify users, resulting in a dataset of 10,286 publicly shared playlists comprising over 200,000 unique songs and 55,000 artists. Then, we utilize statistical analyses and machine learning algorithms to build accurate predictive models for users' attributes.