🤖 AI Summary
This study addresses the subjectivity and cognitive complexity inherent in visual advertising creativity assessment. We formalize creativity for the first time as two annotatable dimensions—*atypicality* and *originality*—and construct a fine-grained human-annotated benchmark. We propose a novel multimodal evaluation task tailored to subjective content assessment, systematically benchmarking vision-language models (VLMs) on cross-modal semantic understanding and evaluation consistency using high-quality human annotations. Results show that state-of-the-art VLMs approach human performance in atypicality recognition but exhibit substantial deficits in originality judgment. Our primary contribution is the first dual-dimensional quantitative framework for advertising creativity evaluation, which empirically delineates the capabilities and limitations of VLMs in subjective, high-level cognitive tasks—and thereby identifies concrete avenues for improvement.
📝 Abstract
Evaluating creativity is challenging, even for humans, not only because of its subjectivity but also because it involves complex cognitive processes. Inspired by work in marketing, we attempt to break down visual advertisement creativity into atypicality and originality. With fine-grained human annotations on these dimensions, we propose a suit of tasks specifically for such a subjective problem. We also evaluate the alignment between state-of-the-art (SoTA) vision language models (VLM) and humans on our proposed benchmark, demonstrating both the promises and challenges of using VLMs for automatic creativity assessment.