A Multimodal Symphony: Integrating Taste and Sound through Generative AI

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work investigates cross-modal mapping between taste and audition, addressing the challenge of semantically controlled music generation conditioned on fine-grained taste attributes (e.g., sweet, sour, bitter). Method: We propose an end-to-end fine-tuning framework built upon MusicGEN, with a customized text encoder specifically designed to model taste semantics. We introduce Taste2Music—the first open-source, human-annotated taste–music alignment dataset—to support training and evaluation. Contribution/Results: Our approach achieves the first semantic, attribute-controllable generation of audio from taste descriptors. We establish a human perception-driven evaluation protocol and conduct a large-scale subjective study (N=111). Results show statistically significant improvements in taste–music consistency (p<0.01) and accurate conveyance of core taste dimensions. This work pioneers a new paradigm for cross-modal perceptual modeling and advances generative AI applications in sensory science.

Technology Category

Application Category

📝 Abstract

In recent decades, neuroscientific and psychological research has traced direct relationships between taste and auditory perceptions. This article explores multimodal generative models capable of converting taste information into music, building on this foundational research. We provide a brief review of the state of the art in this field, highlighting key findings and methodologies. We present an experiment in which a fine-tuned version of a generative music model (MusicGEN) is used to generate music based on detailed taste descriptions provided for each musical piece. The results are promising: according the participants' ($n=111$) evaluation, the fine-tuned model produces music that more coherently reflects the input taste descriptions compared to the non-fine-tuned model. This study represents a significant step towards understanding and developing embodied interactions between AI, sound, and taste, opening new possibilities in the field of generative AI. We release our dataset, code and pre-trained model at: https://osf.io/xs5jy/.

Problem

Research questions and friction points this paper is trying to address.

Exploring taste-to-music conversion using generative AI models.

Evaluating fine-tuned MusicGEN for coherent taste-based music generation.

Advancing AI-driven multimodal interactions between sound and taste.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI converts taste to music

Fine-tuned MusicGEN enhances taste-music coherence

Dataset and model released for public use

🔎 Similar Papers

No similar papers found.