DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of optimizing advertisers’ cumulative value under strict budget constraints in low-data regimes within online advertising. To this end, the authors propose DARA, a two-stage framework: the first stage leverages the in-context learning capability of large language models (LLMs) to generate an initial campaign plan, while the second stage refines this plan through feedback-driven reasoning for precise numerical optimization. The approach innovatively combines the few-shot generalization strength of LLMs with reinforcement learning fine-tuning, introducing a GRPO-Adaptive policy that dynamically optimizes the reference strategy. By decoupling the decision process into distinct reasoning and optimization phases, DARA achieves superior performance over existing baselines on both real-world and synthetic datasets, consistently enhancing advertisers’ cumulative value under stringent budget limitations.

Technology Category

Application Category

📝 Abstract
Optimizing the advertiser's cumulative value of winning impressions under budget constraints poses a complex challenge in online advertising, under the paradigm of AI-Generated Bidding (AIGB). Advertisers often have personalized objectives but limited historical interaction data, resulting in few-shot scenarios where traditional reinforcement learning (RL) methods struggle to perform effectively. Large Language Models (LLMs) offer a promising alternative for AIGB by leveraging their in-context learning capabilities to generalize from limited data. However, they lack the numerical precision required for fine-grained optimization. To address this limitation, we introduce GRPO-Adaptive, an efficient LLM post-training strategy that enhances both reasoning and numerical precision by dynamically updating the reference policy during training. Built upon this foundation, we further propose DARA, a novel dual-phase framework that decomposes the decision-making process into two stages: a few-shot reasoner that generates initial plans via in-context prompting, and a fine-grained optimizer that refines these plans using feedback-driven reasoning. This separation allows DARA to combine LLMs'in-context learning strengths with precise adaptability required by AIGB tasks. Extensive experiments on both real-world and synthetic data environments demonstrate that our approach consistently outperforms existing baselines in terms of cumulative advertiser value under budget constraints.
Problem

Research questions and friction points this paper is trying to address.

few-shot learning
budget allocation
online advertising
AI-Generated Bidding
cumulative value optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

In-Context Learning
Few-Shot Optimization
RL-Finetuned LLMs
Budget Allocation
Online Advertising
🔎 Similar Papers
M
Mingxuan Song
Peking University, School of Computer Science
Y
Yusen Huo
Alibaba Group
B
Bohan Zhou
Peking University, School of Computer Science
S
Shenglin Yin
Peking University, School of Computer Science
Zhen Xiao
Zhen Xiao
Peking University
distributed systemscloud computingmachine learning
Jieyi Long
Jieyi Long
Northwestern University
BlockchainDistributed SystemGenerative AIEDA
Z
Zhilin Zhang
Alibaba Group
C
Chuan Yu
Alibaba Group