DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

194K/year
🤖 AI Summary
This work addresses the inefficiency of controllable visual data generation and the insufficiency of supervision from single-pass synthesis by proposing a goal-driven, closed-loop visual data engine. The framework enables continuous refinement and expansion of multimodal outputs—including RGB images, masks, depth maps, surface normals, meshes, poses, and trajectories—through a dual-loop mechanism comprising intra-sample self-correction and cross-iteration self-expansion. It integrates scene-aware generation, feedback-driven refinement, and a dual-gated validation strategy to ensure fidelity and consistency. On the object rotation image synthesis task, our method with DualGate significantly outperforms baseline approaches and existing LoRA-based techniques, achieving state-of-the-art results on both SpatialEdit and held-out test sets. Ablation studies further confirm the contribution of each component to overall performance.
📝 Abstract
Constructing controllable visual data is a major bottleneck for image editing and multimodal understanding. Useful supervision is rarely produced by a single rendering pass; instead it emerges through iterative generation, inspection, correction, filtering, and export. We present DataEvolver, a closed-loop visual data engine that organizes this process around explicit goals, persistent artifacts, bounded corrective actions, and acceptance decisions. DataEvolver supports multiple artifact types, including RGB images, masks, depth maps, normal maps, meshes, poses, trajectories, and review traces. In the current release, the system operates through two coupled loops: generation-time self-correction within each sample and validation-time self-expansion across dataset rounds. We validate the framework on an image-level object-rotation setting. With a fixed Qwen-Edit LoRA probe, our final Ours+DualGate model outperforms both the unadapted base model and a public multi-angle LoRA on SpatialEdit and a held-out evaluation set. Ablations show a consistent improvement path from scene-aware generation to feedback-driven correction and dual-gated validation. Beyond the released rotation data, our main contribution is a reusable framework for building visual datasets through explicit goal tracking, review, correction, and acceptance loops.
Problem

Research questions and friction points this paper is trying to address.

controllable visual data
image editing
multimodal understanding
iterative data generation
supervision bottleneck
Innovation

Methods, ideas, or system contributions that make the work stand out.

goal-driven loop
visual data engine
self-correction
dual-gated validation
closed-loop data generation
Q
Qisong Zhang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
W
Wenzhuo Wu
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
Z
Zhuangzhuang Jia
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
Yunhao Yang
Yunhao Yang
University of Texas at Austin
Formal methodsAutonomyPrivacy
Huayu Zhang
Huayu Zhang
Senior Engineer, Huawei Technologies Co., Ltd
Distributed SystemNetwork ScienceMachine LearningOptimizationGraph Theory
X
Xianghao Zang
Institute of Artificial Intelligence (TeleAI), China Telecom
Z
Zhixiang He
Institute of Artificial Intelligence (TeleAI), China Telecom
Z
Zhongjiang He
Institute of Artificial Intelligence (TeleAI), China Telecom
Kongming Liang
Kongming Liang
Beijing University of Posts and Telecommunications
Computer VisionPattern RecognitionMachine Learning
Zhanyu Ma
Zhanyu Ma
Beijing University of Posts and Telecommunications
Pattern RecognitionMachine LearningComputer VisionMultimedia TechnologyDeep Learning