MAC: Multi-Agent Constitution Learning

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing constitutional learning methods for large language models, which rely heavily on extensive annotated data and unstructured prompts that hinder scalability. The authors propose MAC (Multi-Agent Constitutional learning) and its enhanced variant MAC+, a framework employing collaborative agents—each responsible for accepting, editing, or rejecting rule updates—to iteratively refine a human-readable and auditable set of natural language rules. By integrating reinforcement learning with trajectory replay, the approach efficiently learns behavioral policies without requiring model parameter updates. Evaluated on low-resource tasks such as PII annotation, MAC and MAC+ outperform current prompt optimization techniques by over 50% and achieve performance comparable to supervised fine-tuning and GRPO, demonstrating both efficacy and scalability in resource-constrained settings.

Technology Category

Application Category

📝 Abstract
Constitutional AI is a method to oversee and control LLMs based on a set of rules written in natural language. These rules are typically written by human experts, but could in principle be learned automatically given sufficient training data for the desired behavior. Existing LLM-based prompt optimizers attempt this but are ineffective at learning constitutions since (i) they require many labeled examples and (ii) lack structure in the optimized prompts, leading to diminishing improvements as prompt size grows. To address these limitations, we propose Multi-Agent Constitutional Learning (MAC), which optimizes over structured prompts represented as sets of rules using a network of agents with specialized tasks to accept, edit, or reject rule updates. We also present MAC+, which improves performance by training agents on successful trajectories to reinforce updates leading to higher reward. We evaluate MAC on tagging Personally Identifiable Information (PII), a classification task with limited labels where interpretability is critical, and demonstrate that it generalizes to other agentic tasks such as tool calling. MAC outperforms recent prompt optimization methods by over 50%, produces human-readable and auditable rule sets, and achieves performance comparable to supervised fine-tuning and GRPO without requiring parameter updates.
Problem

Research questions and friction points this paper is trying to address.

Constitutional AI
prompt optimization
structured prompts
rule learning
label efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constitutional AI
Multi-Agent Learning
Structured Prompt Optimization
Rule-based Interpretability
Prompt Engineering
🔎 Similar Papers
No similar papers found.