InstaDA: Augmenting Instance Segmentation Data with Dual-Agent System

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Instance segmentation faces challenges of prohibitively expensive high-quality annotations and severe class imbalance. To address these, we propose a dual-agent collaborative data augmentation framework that enhances data quality and diversity without requiring model training. A Text-Agent—implemented via a large language model—dynamically refines textual prompts through a Prompt Rethink mechanism; an Image-Agent—based on a diffusion model—generates high-fidelity instance masks conditioned on these refined prompts, which are then composited semantically consistently via Copy-Paste augmentation. The framework fully leverages prior knowledge from the original dataset, significantly improving data utilization efficiency and generation controllability. On the LVIS v1.0 validation set, our method achieves +4.0 box AP and +3.3 mask AP over the baseline, outperforming DiverGen—especially on common and frequent categories.

Technology Category

Application Category

📝 Abstract
Acquiring high-quality instance segmentation data is challenging due to the labor-intensive nature of the annotation process and significant class imbalances within datasets. Recent studies have utilized the integration of Copy-Paste and diffusion models to create more diverse datasets. However, these studies often lack deep collaboration between large language models (LLMs) and diffusion models, and underutilize the rich information within the existing training data. To address these limitations, we propose InstaDA, a novel, training-free Dual-Agent system designed to augment instance segmentation datasets. First, we introduce a Text-Agent (T-Agent) that enhances data diversity through collaboration between LLMs and diffusion models. This agent features a novel Prompt Rethink mechanism, which iteratively refines prompts based on the generated images. This process not only fosters collaboration but also increases image utilization and optimizes the prompts themselves. Additionally, we present an Image-Agent (I-Agent) aimed at enriching the overall data distribution. This agent augments the training set by generating new instances conditioned on the training images. To ensure practicality and efficiency, both agents operate as independent and automated workflows, enhancing usability. Experiments conducted on the LVIS 1.0 validation set indicate that InstaDA achieves significant improvements, with an increase of +4.0 in box average precision (AP) and +3.3 in mask AP compared to the baseline. Furthermore, it outperforms the leading model, DiverGen, by +0.3 in box AP and +0.1 in mask AP, with a notable +0.7 gain in box AP on common categories and mask AP gains of +0.2 on common categories and +0.5 on frequent categories.
Problem

Research questions and friction points this paper is trying to address.

Augmenting instance segmentation data with dual-agent system
Addressing labor-intensive annotation and class imbalance issues
Enhancing collaboration between LLMs and diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Agent system for instance segmentation data augmentation
Text-Agent with Prompt Rethink mechanism using LLMs
Image-Agent generates new instances from training images
🔎 Similar Papers
No similar papers found.