Music2Palette: Emotion-aligned Color Palette Generation via Cross-Modal Representation Learning

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Existing music-to-palette generation methods suffer from two key limitations: (1) they output only a single dominant color, failing to capture dynamic emotional shifts in music; or (2) they rely on intermediate text or image representations, leading to loss of fine-grained emotional semantics. To address this, we propose an end-to-end cross-modal generation framework. We introduce MuCED—the first professionally annotated music-to-palette dataset—and design a joint architecture comprising a music encoder and a color decoder to directly model auditory-to-visual emotional mapping. We further incorporate a multi-objective optimization strategy grounded in Russell’s circumplex model of affect, jointly optimizing emotional alignment, color diversity, and palette coherence. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches across multiple quantitative metrics. Moreover, it exhibits superior expressiveness and practical utility in downstream applications including music-driven image recoloring, video generation, and visual analytics.

Technology Category

Application Category

📝 Abstract

Emotion alignment between music and palettes is crucial for effective multimedia content, yet misalignment creates confusion that weakens the intended message. However, existing methods often generate only a single dominant color, missing emotion variation. Others rely on indirect mappings through text or images, resulting in the loss of crucial emotion details. To address these challenges, we present Music2Palette, a novel method for emotion-aligned color palette generation via cross-modal representation learning. We first construct MuCED, a dataset of 2,634 expert-validated music-palette pairs aligned through Russell-based emotion vectors. To directly translate music into palettes, we propose a cross-modal representation learning framework with a music encoder and color decoder. We further propose a multi-objective optimization approach that jointly enhances emotion alignment, color diversity, and palette coherence. Extensive experiments demonstrate that our method outperforms current methods in interpreting music emotion and generating attractive and diverse color palettes. Our approach enables applications like music-driven image recoloring, video generating, and data visualization, bridging the gap between auditory and visual emotion experiences.

Problem

Research questions and friction points this paper is trying to address.

Achieving emotion alignment between music and color palettes

Generating diverse colors instead of single dominant ones

Avoiding indirect mappings that lose emotional details

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-modal learning for music-to-palette translation

Multi-objective optimization for diverse coherent palettes

Expert-validated dataset for emotion-color alignment

🔎 Similar Papers

No similar papers found.