Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning

📅 2025-07-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing music-to-palette generation methods suffer from two key limitations: (1) they output only a single dominant color, failing to capture dynamic emotional shifts in music; or (2) they rely on intermediate text or image representations, leading to loss of fine-grained emotional semantics. To address this, we propose an end-to-end cross-modal generation framework. We introduce MuCED—the first professionally annotated music-to-palette dataset—and design a joint architecture comprising a music encoder and a color decoder to directly model auditory-to-visual emotional mapping. We further incorporate a multi-objective optimization strategy grounded in Russell’s circumplex model of affect, jointly optimizing emotional alignment, color diversity, and palette coherence. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches across multiple quantitative metrics. Moreover, it exhibits superior expressiveness and practical utility in downstream applications including music-driven image recoloring, video generation, and visual analytics.

Technology Category

Application Category

📝 Abstract
Emotion alignment between music and palettes is crucial for effective multimedia content, yet misalignment creates confusion that weakens the intended message. However, existing methods often generate only a single dominant color, missing emotion variation. Others rely on indirect mappings through text or images, resulting in the loss of crucial emotion details. To address these challenges, we present Music2Palette, a novel method for emotion-aligned color palette generation via cross-modal representation learning. We first construct MuCED, a dataset of 2,634 expert-validated music-palette pairs aligned through Russell-based emotion vectors. To directly translate music into palettes, we propose a cross-modal representation learning framework with a music encoder and color decoder. We further propose a multi-objective optimization approach that jointly enhances emotion alignment, color diversity, and palette coherence. Extensive experiments demonstrate that our method outperforms current methods in interpreting music emotion and generating attractive and diverse color palettes. Our approach enables applications like music-driven image recoloring, video generating, and data visualization, bridging the gap between auditory and visual emotion experiences.
Problem

Research questions and friction points this paper is trying to address.

Achieving emotion alignment between music and color palettes
Generating diverse colors instead of single dominant ones
Avoiding indirect mappings that lose emotional details
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal learning for music-to-palette translation
Multi-objective optimization for diverse coherent palettes
Expert-validated dataset for emotion-color alignment
🔎 Similar Papers
No similar papers found.
J
Jiayun Hu
East China Normal University, Shanghai, China
Y
Yueyi He
East China Normal University, Shanghai, China
Tianyi Liang
Tianyi Liang
PHD, East China Normal University, Shanghai AI Lab,Shanghai Innovation Institute
Multimodal LearningLLMsImage Editing
C
Changbo Wang
East China Normal University, Shanghai, China
Chenhui Li
Chenhui Li
Baidu
AINLPCV