🤖 AI Summary
Current evaluation of advertising images relies heavily on subjective judgment, lacking scalability, standardization, and interpretability. To address this limitation, this work proposes the A³ framework, which comprises a theoretical paradigm (A³-Law), a multidimensional annotated dataset (A³-Dataset), a multimodal large language model (A³-Align), and an evaluation benchmark (A³-Bench). The framework introduces the first aesthetics assessment system for advertisements grounded in a three-tier structure—Perception, Form, and Desire. A³-Align integrates multimodal large language models with chain-of-thought guided learning, significantly outperforming existing methods on A³-Bench and demonstrating strong generalization capabilities in both ad preference ranking and diagnostic tasks.
📝 Abstract
Advertising images significantly impact commercial conversion rates and brand equity, yet current evaluation methods rely on subjective judgments, lacking scalability, standardized criteria, and interpretability. To address these challenges, we present A^3 (Advertising Aesthetic Assessment), a comprehensive framework encompassing four components: a paradigm (A^3-Law), a dataset (A^3-Dataset), a multimodal large language model (A^3-Align), and a benchmark (A^3-Bench). Central to A^3 is a theory-driven paradigm, A^3-Law, comprising three hierarchical stages: (1) Perceptual Attention, evaluating perceptual image signals for their ability to attract attention; (2) Formal Interest, assessing formal composition of image color and spatial layout in evoking interest; and (3) Desire Impact, measuring desire evocation from images and their persuasive impact. Building on A^3-Law, we construct A^3-Dataset with 120K instruction-response pairs from 30K advertising images, each richly annotated with multi-dimensional labels and Chain-of-Thought (CoT) rationales. We further develop A^3-Align, trained under A^3-Law with CoT-guided learning on A^3-Dataset. Extensive experiments on A^3-Bench demonstrate that A^3-Align achieves superior alignment with A^3-Law compared to existing models, and this alignment generalizes well to quality advertisement selection and prescriptive advertisement critique, indicating its potential for broader deployment. Dataset, code, and models can be found at: https://github.com/euleryuan/A3-Align.