ARGUS: Policy-Adaptive Ad Governance via Evolving Reinforcement with Adversarial Umpiring

📅 2026-05-03

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the challenges of online advertising governance arising from dynamically evolving regulatory policies, which lead to inconsistencies in historical labels and ambiguity in violation detection. To tackle these issues, the authors propose ARGUS, a multi-agent adversarial arbitration framework inspired by a prosecutor–defender–judge paradigm. ARGUS achieves policy adaptability through three stages: strategy seeding, adversarial label correction, and implicit knowledge discovery. The system integrates RAG-enhanced policy knowledge, chain-of-thought synthesis, adversarial learning, and reinforcement learning to establish a dynamic, reward-driven reasoning mechanism. Experimental results demonstrate that ARGUS significantly outperforms conventional fine-tuning approaches on both industrial and public datasets, efficiently identifying gray-area violations with minimal labeled data while enabling highly adaptive compliance governance for online advertisements.

📝 Abstract

Online advertising governance faces significant challenges due to the non-stationary nature of regulatory policies, where emerging mandates (e.g., restrictions on education or aesthetic anxiety) create severe label inconsistencies and reasoning ambiguities in historical datasets. In this paper, we propose ARGUS, a policy-adaptive governance system that enables evolving reinforcement through multi-agent adversarial umpiring. ARGUS addresses the sparsity of new policy data by employing a three-stage framework: (1) Policy Seeding for initial perception; (2) Adversarial Label Rectification, which utilizes a ``Prosecutor-Defender-Umpire'' architecture to resolve conflicts between stale labels and new mandates; and (3) Latent Knowledge Discovery, which employs a tripartite dialectical discussion to unearth sophisticated, ``gray-area'' violations. By leveraging RAG-enhanced policy knowledge and Chain-of-Thought synthesis as dynamic rewards for reinforcement learning, ARGUS synchronizes its reasoning pathways with evolving regulations. Extensive experiments on both industrial and public datasets demonstrate that ARGUS significantly outperforms traditional fine-tuning baselines, achieving superior policy-adaptive learning with minimal gold data.

Problem

Research questions and friction points this paper is trying to address.

online advertising governance

non-stationary regulatory policies

label inconsistency

policy adaptation

adversarial reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

policy-adaptive governance

adversarial umpiring

reinforcement learning

label rectification

gray-area violation

🔎 Similar Papers

Truthful Aggregation of LLMs with an Application to Online Advertising

2024-05-09arXiv.orgCitations: 7