One Size, Many Fits: Aligning Diverse Group-Wise Click Preferences in Large-Scale Advertising Image Generation

📅 2026-02-02

📈 Citations: 1

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the limitations of existing advertising image generation methods, which adopt a one-size-fits-all strategy and neglect inter-group differences in click preferences, leading to suboptimal performance for certain user segments. To overcome this, the authors propose OSMF, a unified framework that enables personalized ad content generation through product-aware adaptive grouping and preference-conditioned image synthesis. The key contributions include the first introduction of a group-level click preference alignment mechanism, the construction of GAIP—the first large-scale dataset capturing group-specific advertising image preferences—and the development of Group-DPO optimization integrated with a group-aware multimodal large language model (G-MLLM). Both offline evaluations and online experiments demonstrate that the proposed approach significantly improves click-through rates across diverse user groups, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

Advertising image generation has increasingly focused on online metrics like Click-Through Rate (CTR), yet existing approaches adopt a ``one-size-fits-all"strategy that optimizes for overall CTR while neglecting preference diversity among user groups. This leads to suboptimal performance for specific groups, limiting targeted marketing effectiveness. To bridge this gap, we present \textit{One Size, Many Fits} (OSMF), a unified framework that aligns diverse group-wise click preferences in large-scale advertising image generation. OSMF begins with product-aware adaptive grouping, which dynamically organizes users based on their attributes and product characteristics, representing each group with rich collective preference features. Building on these groups, preference-conditioned image generation employs a Group-aware Multimodal Large Language Model (G-MLLM) to generate tailored images for each group. The G-MLLM is pre-trained to simultaneously comprehend group features and generate advertising images. Subsequently, we fine-tune the G-MLLM using our proposed Group-DPO for group-wise preference alignment, which effectively enhances each group's CTR on the generated images. To further advance this field, we introduce the Grouped Advertising Image Preference Dataset (GAIP), the first large-scale public dataset of group-wise image preferences, including around 600K groups built from 40M users. Extensive experiments demonstrate that our framework achieves the state-of-the-art performance in both offline and online settings. Our code and datasets will be released at https://github.com/JD-GenX/OSMF.

Problem

Research questions and friction points this paper is trying to address.

advertising image generation

click-through rate (CTR)

preference diversity

user groups

targeted marketing

Innovation

Methods, ideas, or system contributions that make the work stand out.

group-wise preference alignment

adaptive user grouping

Group-aware Multimodal LLM