LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the fundamental trade-off among data quality, diversity, and cost in constructing high-quality domain-specific datasets for computer vision, this paper introduces the first deep research agent framework dedicated to visual data management. The framework employs a large multimodal language model as its central controller and integrates multi-step reasoning coordination, calibration-driven category discovery, controllable image synthesis, and consensus-based annotation—augmented with non-maximum suppression, voting-based aggregation, and active learning strategies. Evaluated on COCO, it achieves 37.1% mAP in annotation accuracy and detects an average of 14.2 objects per image. On Open Images, it discovers and adds 903 fine-grained object categories, expanding total coverage to over 1,500 classes. Calibration-driven category discovery accelerates discovery efficiency by 40×. Collectively, the framework significantly advances automation, sample efficiency, and semantic breadth in dataset construction.

Technology Category

Application Category

📝 Abstract
Curating high-quality, domain-specific datasets is a major bottleneck for deploying robust vision systems, requiring complex trade-offs between data quality, diversity, and cost when researching vast, unlabeled data lakes. We introduce Labeling Copilot, the first data curation deep research agent for computer vision. A central orchestrator agent, powered by a large multimodal language model, uses multi-step reasoning to execute specialized tools across three core capabilities: (1) Calibrated Discovery sources relevant, in-distribution data from large repositories; (2) Controllable Synthesis generates novel data for rare scenarios with robust filtering; and (3) Consensus Annotation produces accurate labels by orchestrating multiple foundation models via a novel consensus mechanism incorporating non-maximum suppression and voting. Our large-scale validation proves the effectiveness of Labeling Copilot's components. The Consensus Annotation module excels at object discovery: on the dense COCO dataset, it averages 14.2 candidate proposals per image-nearly double the 7.4 ground-truth objects-achieving a final annotation mAP of 37.1%. On the web-scale Open Images dataset, it navigated extreme class imbalance to discover 903 new bounding box categories, expanding its capability to over 1500 total. Concurrently, our Calibrated Discovery tool, tested at a 10-million sample scale, features an active learning strategy that is up to 40x more computationally efficient than alternatives with equivalent sample efficiency. These experiments validate that an agentic workflow with optimized, scalable tools provides a robust foundation for curating industrial-scale datasets.
Problem

Research questions and friction points this paper is trying to address.

Automating high-quality dataset curation for computer vision
Addressing data quality, diversity, and cost trade-offs
Developing scalable tools for industrial-scale data annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orchestrator agent uses multi-step reasoning for tools
Calibrated Discovery sources relevant in-distribution data
Consensus Annotation produces labels via multiple foundation models
🔎 Similar Papers
No similar papers found.