🤖 AI Summary
This work addresses the limitations of traditional two-stage ad copy generation paradigms, which struggle to align downstream conversion objectives—such as click-through conversion rate (CTCVR)—with regulatory compliance constraints, often yielding suboptimal performance. To overcome this, we propose RELATE, a novel framework that jointly models deep conversion metrics and compliance requirements as a multidimensional reward signal. Leveraging large language model–driven reinforcement learning, RELATE enables end-to-end policy optimization with built-in alignment to target objectives during text generation. Evaluated on large-scale industrial datasets, our approach significantly outperforms existing baselines. Furthermore, online deployment demonstrates a statistically significant improvement in CTCVR while strictly adhering to compliance standards.
📝 Abstract
In online advertising, advertising text plays a critical role in attracting user engagement and driving advertiser value. Existing industrial systems typically follow a two-stage paradigm, where candidate texts are first generated and subsequently aligned with online performance metrics such as click-through rate(CTR). This separation often leads to misaligned optimization objectives and low funnel efficiency, limiting global optimality. To address these limitations, we propose RELATE, a reinforcement learning-based end-to-end framework that unifies generation and objective alignment within a single model. Instead of decoupling text generation from downstream metric alignment, RELATE integrates performance and compliance objectives directly into the generation process via policy learning. To better capture ultimate advertiser value beyond click-level signals, We incorporate conversion-oriented metrics into the objective and jointly model them with compliance constraints as multi-dimensional rewards, enabling the model to generate high-quality ad texts that improve conversion performance under policy constraints. Extensive experiments on large-scale industrial datasets demonstrate that RELATE consistently outperforms baselines. Furthermore, online deployment on a production advertising platform yields statistically significant improvements in click-through conversion rate(CTCVR) under strict policy constraints, validating the robustness and real-world effectiveness of the proposed framework.