MAFA: A Multi-Agent Framework for Enterprise-Scale Annotation with Configurable Task Adaptation

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address the backlog of millions of customer utterances awaiting annotation in financial services, this paper proposes a configurable multi-agent collaborative annotation framework. Methodologically, it introduces a novel judge-based consensus reasoning architecture that integrates specialized agent roles, structured reasoning, confidence-level grading, human-in-the-loop feedback, and multilingual intent classification models—enabling no-code configuration and dynamic task adaptation. The primary contribution is the first production-scale deployment of a multi-agent system in a large-scale financial environment. Deployed at JPMorgan Chase, the framework fully eliminated the million-utterance annotation backlog, saving over 5,000 person-hours annually. Annotation consistency reached 86%, while Top-1 accuracy and F1-score improved by 13.8% and 16.9%, respectively, demonstrating significant gains in both efficiency and quality.

Technology Category

Application Category

📝 Abstract

We present MAFA (Multi-Agent Framework for Annotation), a production-deployed system that transforms enterprise-scale annotation workflows through configurable multi-agent collaboration. Addressing the critical challenge of annotation backlogs in financial services, where millions of customer utterances require accurate categorization, MAFA combines specialized agents with structured reasoning and a judge-based consensus mechanism. Our framework uniquely supports dynamic task adaptation, allowing organizations to define custom annotation types (FAQs, intents, entities, or domain-specific categories) through configuration rather than code changes. Deployed at JP Morgan Chase, MAFA has eliminated a 1 million utterance backlog while achieving, on average, 86% agreement with human annotators, annually saving over 5,000 hours of manual annotation work. The system processes utterances with annotation confidence classifications, which are typically 85% high, 10% medium, and 5% low across all datasets we tested. This enables human annotators to focus exclusively on ambiguous and low-coverage cases. We demonstrate MAFA's effectiveness across multiple datasets and languages, showing consistent improvements over traditional and single-agent annotation baselines: 13.8% higher Top-1 accuracy, 15.1% improvement in Top-5 accuracy, and 16.9% better F1 in our internal intent classification dataset and similar gains on public benchmarks. This work bridges the gap between theoretical multi-agent systems and practical enterprise deployment, providing a blueprint for organizations facing similar annotation challenges.

Problem

Research questions and friction points this paper is trying to address.

Addresses annotation backlogs in financial services

Enables configurable annotation types without code changes

Improves accuracy over traditional annotation baselines

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system for enterprise-scale annotation workflows

Configurable task adaptation without code changes

Judge-based consensus mechanism for accurate categorization

🔎 Similar Papers

Adaptive Task Allocation in Multi-Human Multi-Robot Teams under Team Heterogeneity and Dynamic Information Uncertainty