DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work introduces the first background music recommendation task under conversational settings, aiming to recommend ambient music that is contextually appropriate yet non-intrusive from everyday multi-turn dialogues lacking explicit musical descriptions. To facilitate research in this direction, the authors construct DialBGM—the first benchmark dataset comprising 1,200 open-domain dialogues, each paired with four candidate music tracks and human-annotated preference rankings. They further propose a multidimensional evaluation framework encompassing contextual relevance, non-intrusiveness, and consistency. Systematic evaluations based on audio-language models and multimodal large language models reveal that even the best-performing model achieves a Hit@1 score of no more than 35%, substantially lagging behind human performance and highlighting significant limitations in current approaches to aligning conversational context with musical semantics.

Technology Category

Application Category

📝 Abstract

Selecting an appropriate background music (BGM) that supports natural human conversation is a common production step in media and interactive systems. In this paper, we introduce dialogue-conditioned BGM recommendation, where a model should select non-intrusive, fitting music for a multi-turn conversation that often contains no music descriptors. To study this novel problem, we present DialBGM, a benchmark of 1,200 open-domain daily dialogues, each paired with four candidate music clips and annotated with human preference rankings. Rankings are determined by background suitability criteria, including contextual relevance, non-intrusiveness, and consistency. We evaluate a wide range of open-source and proprietary models, including audio-language models and multimodal LLMs, and show that current models fall far short of human judgments; no model exceeds 35% Hit@1 when selecting the top-ranked clip. DialBGM provides a standardized benchmark for developing discourse-aware methods for BGM selection and for evaluating both retrieval-based and generative models.

Problem

Research questions and friction points this paper is trying to address.

background music recommendation

multi-turn dialogue

dialogue-conditioned BGM

non-intrusive music

contextual relevance

Innovation

Methods, ideas, or system contributions that make the work stand out.

dialogue-conditioned BGM recommendation

DialBGM benchmark

multimodal LLMs