Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the ethical constraints of conducting online affective experiments in clinical settings by proposing a world model–based offline affect-aware music recommendation method. The approach integrates a causal Transformer to jointly predict user behavior and emotional feedback, and employs a rollback mechanism to construct an intervenable environment model. Notably, it deploys—on a real-world health platform—the first recommendation system that operates without online interaction. During training, it combines behavior cloning, Direct Preference Optimization (DPO), and a multi-objective utility function to effectively predict affective signals (valence and arousal) under cold-start conditions. The method significantly outperforms greedy optimization baselines while preserving recommendation diversity and mitigating distributional collapse.
📝 Abstract
Functional music applications, from consumer focus and sleep aids to clinical interventions, share a distinctive recommendation problem: success is defined by the listener's affective state, but online experimentation on emotion is ethically constrained, particularly for clinical populations who cannot reliably skip a song or report distress. We describe AMRS, the Affective Music Recommendation System deployed on LUCID's health-and-wellness platforms, which serve clinical users (primarily older adults with neurocognitive conditions) and consumer-wellness users across energize, focus, calm, and sleep modes. AMRS is built around a rollout-based world model: a causal transformer trained on logged listening data to jointly predict engagement, binary rating, and self-reported valence and arousal. The world model serves both as an in-silico simulator for offline policy training and as a stress-testing tool before deployment. A recommender policy initialized by behaviour cloning is fine-tuned offline with Direct Preference Optimization (DPO) against a configurable multi-objective utility function. Under a strict cold-start protocol, the world model predicts both behavioural and affective signals with usable fidelity; DPO improves predicted valence and arousal over the cloned baseline while maintaining a similar diversity profile and avoiding the distributional collapse produced by greedy optimization. We position the work as an early deployed validation of a methodology for affective recommendation when online experimentation is ethically untenable.
Problem

Research questions and friction points this paper is trying to address.

Affective Recommendation
Offline Preference Optimization
Ethical Constraints
Clinical Populations
Music Recommendation
Innovation

Methods, ideas, or system contributions that make the work stand out.

rollout-based world model
affective recommendation
offline preference optimization
Direct Preference Optimization (DPO)
causal transformer
🔎 Similar Papers
No similar papers found.