Improving Generative Ad Text on Facebook using Reinforcement Learning

๐Ÿ“… 2025-07-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study investigates how reinforcement learning (RL) can be applied to post-train large language models (LLMs) to improve Facebook ad copy generation and quantify its economic impact. We propose RLPF (Reinforcement Learning with Performance Feedback), an RL-based fine-tuning method that leverages sparse, real-world business metricsโ€”such as historical click-through rate (CTR)โ€”as reward signals for end-to-end ad copy optimization. To our knowledge, this is the first large-scale industrial validation demonstrating measurable improvements in actual business KPIs from RL-driven generative AI: in an A/B test spanning nearly 35,000 advertisers, RLPF increased CTR by 6.7%, significantly improved return on investment (ROI), and enhanced creative diversity. Our core contribution is a lightweight, feedback-driven RL post-training paradigm tailored to production environments, along with the first systematic quantification of RLโ€™s economic value in deploying generative AI commercially.

Technology Category

Application Category

๐Ÿ“ Abstract
Generative artificial intelligence (AI), in particular large language models (LLMs), is poised to drive transformative economic change. LLMs are pre-trained on vast text data to learn general language patterns, but a subsequent post-training phase is critical to align them for specific real-world tasks. Reinforcement learning (RL) is the leading post-training technique, yet its economic impact remains largely underexplored and unquantified. We examine this question through the lens of the first deployment of an RL-trained LLM for generative advertising on Facebook. Integrated into Meta's Text Generation feature, our model, "AdLlama," powers an AI tool that helps advertisers create new variations of human-written ad text. To train this model, we introduce reinforcement learning with performance feedback (RLPF), a post-training method that uses historical ad performance data as a reward signal. In a large-scale 10-week A/B test on Facebook spanning nearly 35,000 advertisers and 640,000 ad variations, we find that AdLlama improves click-through rates by 6.7% (p=0.0296) compared to a supervised imitation model trained on curated ads. This represents a substantial improvement in advertiser return on investment on Facebook. We also find that advertisers who used AdLlama generated more ad variations, indicating higher satisfaction with the model's outputs. To our knowledge, this is the largest study to date on the use of generative AI in an ecologically valid setting, offering an important data point quantifying the tangible impact of RL post-training. Furthermore, the results show that RLPF is a promising and generalizable approach for metric-driven post-training that bridges the gap between highly capable language models and tangible outcomes.
Problem

Research questions and friction points this paper is trying to address.

Aligning LLMs for real-world ad text generation tasks
Quantifying economic impact of RL post-training on ad performance
Bridging gap between language models and tangible advertising outcomes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning with performance feedback (RLPF)
AdLlama model for generative ad text
Large-scale A/B testing on Facebook
๐Ÿ”Ž Similar Papers
No similar papers found.
Daniel R. Jiang
Daniel R. Jiang
Research Scientist, Meta; Adjunct Professor, University of Pittsburgh
reinforcement learningsequential decision makingBayesian optimization
A
Alex Nikulkov
Meta Platforms, Menlo Park, California, USA.
Y
Yu-Chia Chen
Meta Platforms, Menlo Park, California, USA.
Y
Yang Bai
Meta Platforms, Menlo Park, California, USA.
Z
Zheqing Zhu
Meta Platforms, Menlo Park, California, USA.