Probing and Inducing Combinational Creativity in Vision-Language Models

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work investigates whether vision-language models (VLMs) possess genuine compositional creativity—the ability to generate novel semantic interpretations by fusing existing concepts, rather than relying solely on pattern matching from training data. Method: We propose IEI (Identification-Explanation-Inference), the first computationally grounded, third-order evaluation framework formalizing conceptual blending theory from cognitive science; and introduce CreativeMashup, the first expert-annotated visual mashup dataset explicitly designed for assessing compositional creativity. Contribution/Results: Experiments show that state-of-the-art VLMs surpass average humans on comprehension tasks but fall short of domain experts; under IEI-guided generation, VLMs achieve significantly higher creative quality. Our framework provides dual pathways—rigorous theoretical modeling and systematic, controllable enhancement—for advancing AI creativity research, enabling principled assessment and targeted improvement of compositional reasoning in multimodal foundation models.

Technology Category

Application Category

📝 Abstract

The ability to combine existing concepts into novel ideas stands as a fundamental hallmark of human intelligence. Recent advances in Vision-Language Models (VLMs) like GPT-4V and DALLE-3 have sparked debate about whether their outputs reflect combinational creativity--defined by M. A. Boden (1998) as synthesizing novel ideas through combining existing concepts--or sophisticated pattern matching of training data. Drawing inspiration from cognitive science, we investigate the combinational creativity of VLMs from the lens of concept blending. We propose the Identification-Explanation-Implication (IEI) framework, which decomposes creative processes into three levels: identifying input spaces, extracting shared attributes, and deriving novel semantic implications. To validate this framework, we curate CreativeMashup, a high-quality dataset of 666 artist-generated visual mashups annotated according to the IEI framework. Through extensive experiments, we demonstrate that in comprehension tasks, best VLMs have surpassed average human performance while falling short of expert-level understanding; in generation tasks, incorporating our IEI framework into the generation pipeline significantly enhances the creative quality of VLMs outputs. Our findings establish both a theoretical foundation for evaluating artificial creativity and practical guidelines for improving creative generation in VLMs.

Problem

Research questions and friction points this paper is trying to address.

Evaluating combinational creativity in Vision-Language Models

Proposing IEI framework for decomposing creative processes

Enhancing creative quality in VLM outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

IEI framework decomposes creative processes

CreativeMashup dataset validates framework

IEI enhances VLM creative generation

🔎 Similar Papers

Pre-trained Vision-Language Models Learn Discoverable Visual Concepts

2024-04-19arXiv.orgCitations: 4