RELATE: A Reinforcement Learning-Enhanced LLM Framework for Advertising Text Generation

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of traditional two-stage ad copy generation paradigms, which struggle to align downstream conversion objectives—such as click-through conversion rate (CTCVR)—with regulatory compliance constraints, often yielding suboptimal performance. To overcome this, we propose RELATE, a novel framework that jointly models deep conversion metrics and compliance requirements as a multidimensional reward signal. Leveraging large language model–driven reinforcement learning, RELATE enables end-to-end policy optimization with built-in alignment to target objectives during text generation. Evaluated on large-scale industrial datasets, our approach significantly outperforms existing baselines. Furthermore, online deployment demonstrates a statistically significant improvement in CTCVR while strictly adhering to compliance standards.

Technology Category

Application Category

📝 Abstract
In online advertising, advertising text plays a critical role in attracting user engagement and driving advertiser value. Existing industrial systems typically follow a two-stage paradigm, where candidate texts are first generated and subsequently aligned with online performance metrics such as click-through rate(CTR). This separation often leads to misaligned optimization objectives and low funnel efficiency, limiting global optimality. To address these limitations, we propose RELATE, a reinforcement learning-based end-to-end framework that unifies generation and objective alignment within a single model. Instead of decoupling text generation from downstream metric alignment, RELATE integrates performance and compliance objectives directly into the generation process via policy learning. To better capture ultimate advertiser value beyond click-level signals, We incorporate conversion-oriented metrics into the objective and jointly model them with compliance constraints as multi-dimensional rewards, enabling the model to generate high-quality ad texts that improve conversion performance under policy constraints. Extensive experiments on large-scale industrial datasets demonstrate that RELATE consistently outperforms baselines. Furthermore, online deployment on a production advertising platform yields statistically significant improvements in click-through conversion rate(CTCVR) under strict policy constraints, validating the robustness and real-world effectiveness of the proposed framework.
Problem

Research questions and friction points this paper is trying to address.

advertising text generation
optimization misalignment
funnel efficiency
conversion performance
policy constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Learning
End-to-end Framework
Advertising Text Generation
Conversion Optimization
Policy Constraints
🔎 Similar Papers
No similar papers found.
J
Jinfang Wang
Baidu Inc., Beijing, China
Jiajie Liu
Jiajie Liu
Peking University
Computer Vision
J
Jianwei Wu
Baidu Inc., Beijing, China
Ziqin Luo
Ziqin Luo
Fudan University
LLMs
Z
Zhen Chen
Baidu Inc., Beijing, China
Chunlei Li
Chunlei Li
Harbin Institute of Technology
Evolutionary ComputationMulti-objective optimization
B
Biao Han
Baidu Inc., Beijing, China
T
Tao Deng
Baidu Inc., Beijing, China
Y
Yi Li
Baidu Inc., Beijing, China
S
Shuanglong Li
Baidu Inc., Beijing, China
L
Lin Liu
Baidu Inc., Beijing, China