Adversarial Arena: Crowdsourcing Data Generation through Interactive Competition

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the scarcity and high cost of high-quality, diverse multi-turn dialogue data for post-training large language models, particularly in low-resource domains. To tackle this challenge, the authors propose an adversarial arena–based data generation framework that reframes data construction as a multi-agent competitive interaction: an attacker designs challenging prompts while a defender generates aligned responses. Integrated with a crowdsourcing incentive mechanism, this approach autonomously produces dialogues exhibiting high difficulty and diversity. Focusing on safety alignment in the cybersecurity domain, the project generated 19,683 multi-turn dialogues. Models fine-tuned on this dataset demonstrate significant improvements in secure code generation, achieving performance gains of 18.47% and 29.42% on the CyberSecEval-Instruct and CyberSecEval-MITRE benchmarks, respectively.

Technology Category

Application Category

📝 Abstract

Post-training Large Language Models requires diverse, high-quality data which is rare and costly to obtain, especially in low resource domains and for multi-turn conversations. Common solutions are crowdsourcing or synthetic generation, but both often yield low-quality or low-diversity data. We introduce Adversarial Arena for building high quality conversational datasets by framing data generation as an adversarial task: attackers create prompts, and defenders generate responses. This interactive competition between multiple teams naturally produces diverse and complex data. We validated this approach by conducting a competition with 10 academic teams from top US and European universities, each building attacker or defender bots. The competition, focused on safety alignment of LLMs in cybersecurity, generated 19,683 multi-turn conversations. Fine-tuning an open-source model on this dataset produced an 18.47% improvement in secure code generation on CyberSecEval-Instruct and 29.42% improvement on CyberSecEval-MITRE.

Problem

Research questions and friction points this paper is trying to address.

data generation

large language models

crowdsourcing

multi-turn conversations

low-resource domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial Arena

interactive competition

crowdsourced data generation

safety alignment

multi-turn conversations

🔎 Similar Papers

No similar papers found.