Probing and Inducing Combinational Creativity in Vision-Language Models

📅 2025-04-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether vision-language models (VLMs) possess genuine compositional creativity—the ability to generate novel semantic interpretations by fusing existing concepts, rather than relying solely on pattern matching from training data. Method: We propose IEI (Identification-Explanation-Inference), the first computationally grounded, third-order evaluation framework formalizing conceptual blending theory from cognitive science; and introduce CreativeMashup, the first expert-annotated visual mashup dataset explicitly designed for assessing compositional creativity. Contribution/Results: Experiments show that state-of-the-art VLMs surpass average humans on comprehension tasks but fall short of domain experts; under IEI-guided generation, VLMs achieve significantly higher creative quality. Our framework provides dual pathways—rigorous theoretical modeling and systematic, controllable enhancement—for advancing AI creativity research, enabling principled assessment and targeted improvement of compositional reasoning in multimodal foundation models.

Technology Category

Application Category

📝 Abstract
The ability to combine existing concepts into novel ideas stands as a fundamental hallmark of human intelligence. Recent advances in Vision-Language Models (VLMs) like GPT-4V and DALLE-3 have sparked debate about whether their outputs reflect combinational creativity--defined by M. A. Boden (1998) as synthesizing novel ideas through combining existing concepts--or sophisticated pattern matching of training data. Drawing inspiration from cognitive science, we investigate the combinational creativity of VLMs from the lens of concept blending. We propose the Identification-Explanation-Implication (IEI) framework, which decomposes creative processes into three levels: identifying input spaces, extracting shared attributes, and deriving novel semantic implications. To validate this framework, we curate CreativeMashup, a high-quality dataset of 666 artist-generated visual mashups annotated according to the IEI framework. Through extensive experiments, we demonstrate that in comprehension tasks, best VLMs have surpassed average human performance while falling short of expert-level understanding; in generation tasks, incorporating our IEI framework into the generation pipeline significantly enhances the creative quality of VLMs outputs. Our findings establish both a theoretical foundation for evaluating artificial creativity and practical guidelines for improving creative generation in VLMs.
Problem

Research questions and friction points this paper is trying to address.

Evaluating combinational creativity in Vision-Language Models
Proposing IEI framework for decomposing creative processes
Enhancing creative quality in VLM outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

IEI framework decomposes creative processes
CreativeMashup dataset validates framework
IEI enhances VLM creative generation
🔎 Similar Papers
Y
Yongqian Peng
Institute for Artificial Intelligence, Peking University; Yuanpei College, Peking University
Yuxi Ma
Yuxi Ma
Institute for Artificial Intelligence, Peking University
psychologyartificial intelligencecognition
M
Mengmeng Wang
State Key Laboratory of General Artificial Intelligence, BIGAI
Y
Yuxuan Wang
State Key Laboratory of General Artificial Intelligence, BIGAI
Y
Yizhou Wang
Center on Frontiers of Computing Studies, School of Computer Science, Peking University
C
Chi Zhang
State Key Laboratory of General Artificial Intelligence, BIGAI
Yixin Zhu
Yixin Zhu
Assistant Professor, Peking University
Computer VisionVisual ReasoningHuman-Robot Teaming
Z
Zilong Zheng
State Key Laboratory of General Artificial Intelligence, BIGAI