DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the challenge of optimizing advertisers’ cumulative value under strict budget constraints in low-data regimes within online advertising. To this end, the authors propose DARA, a two-stage framework: the first stage leverages the in-context learning capability of large language models (LLMs) to generate an initial campaign plan, while the second stage refines this plan through feedback-driven reasoning for precise numerical optimization. The approach innovatively combines the few-shot generalization strength of LLMs with reinforcement learning fine-tuning, introducing a GRPO-Adaptive policy that dynamically optimizes the reference strategy. By decoupling the decision process into distinct reasoning and optimization phases, DARA achieves superior performance over existing baselines on both real-world and synthetic datasets, consistently enhancing advertisers’ cumulative value under stringent budget limitations.

Technology Category

Application Category

📝 Abstract

Optimizing the advertiser's cumulative value of winning impressions under budget constraints poses a complex challenge in online advertising, under the paradigm of AI-Generated Bidding (AIGB). Advertisers often have personalized objectives but limited historical interaction data, resulting in few-shot scenarios where traditional reinforcement learning (RL) methods struggle to perform effectively. Large Language Models (LLMs) offer a promising alternative for AIGB by leveraging their in-context learning capabilities to generalize from limited data. However, they lack the numerical precision required for fine-grained optimization. To address this limitation, we introduce GRPO-Adaptive, an efficient LLM post-training strategy that enhances both reasoning and numerical precision by dynamically updating the reference policy during training. Built upon this foundation, we further propose DARA, a novel dual-phase framework that decomposes the decision-making process into two stages: a few-shot reasoner that generates initial plans via in-context prompting, and a fine-grained optimizer that refines these plans using feedback-driven reasoning. This separation allows DARA to combine LLMs'in-context learning strengths with precise adaptability required by AIGB tasks. Extensive experiments on both real-world and synthetic data environments demonstrate that our approach consistently outperforms existing baselines in terms of cumulative advertiser value under budget constraints.

Problem

Research questions and friction points this paper is trying to address.

few-shot learning

budget allocation

online advertising

AI-Generated Bidding

cumulative value optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-Context Learning

Few-Shot Optimization

RL-Finetuned LLMs

Budget Allocation

Online Advertising

🔎 Similar Papers

Truthful Aggregation of LLMs with an Application to Online Advertising

2024-05-09arXiv.orgCitations: 7