🤖 AI Summary
This work addresses the tension in content promotion between short-term revenue and long-term recommendation model performance, where existing auction mechanisms—while mitigating cold-start issues—may introduce exposure misalignment that contaminates learning signals. The problem is formulated as a bi-objective optimization task balancing immediate gains and future model utility. For the first time, Fisher information and optimal experimental design are leveraged to derive a gradient coverage metric and a confidence-gated gradient heuristic, enabling zeroth-order estimation of learning signals under black-box models. Building on this, a two-stage automated bidding algorithm is developed, integrating Lagrangian dual optimization, online budget control, and marginal-utility-based bidding. Experiments on both synthetic and real-world datasets demonstrate that the proposed method significantly outperforms baselines, consistently improving final AUC and LogLoss, precisely managing budgets, and maintaining robustness under zeroth-order gradient approximations.
📝 Abstract
Modern content platforms offer paid promotion to mitigate cold start by allocating exposure via auctions. Our empirical analysis reveals a counterintuitive flaw in this paradigm: while promotion rescues low-to-medium quality content, it can harm high-quality content by forcing exposure to suboptimal audiences, polluting engagement signals and downgrading future recommendation. We recast content promotion as a dual-objective optimization that balances short-term value acquisition with long-term model improvement. To make this tractable at bid time in content promotion, we introduce a decomposable surrogate objective, gradient coverage, and establish its formal connection to Fisher Information and optimal experimental design. We design a two-stage auto-bidding algorithm based on Lagrange duality that dynamically paces budget through a shadow price and optimizes impression-level bids using per-impression marginal utilities. To address missing labels at bid time, we propose a confidence-gated gradient heuristic, paired with a zeroth-order variant for black-box models that reliably estimates learning signals in real time. We provide theoretical guarantees, proving monotone submodularity of the composite objective, sublinear regret in online auction, and budget feasibility. Extensive offline experiments on synthetic and real-world datasets validate the framework: it outperforms baselines, achieves superior final AUC/LogLoss, adheres closely to budget targets, and remains effective when gradients are approximated zeroth-order. These results show that strategic, information-aware promotion can improve long-term model performance and organic outcomes beyond naive impression-maximization strategies.