From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current large multimodal models, which rely on static datasets and fixed training pipelines, rendering them unable to dynamically identify and rectify capability gaps. To overcome this, we propose a Diagnosis-driven Progressive Evolution (DPE) framework that establishes a spiral training loop of “diagnose–generate–reinforce.” By iteratively attributing model failure cases, DPE dynamically adjusts data mixture ratios and orchestrates multi-agent collaboration to generate high-quality, targeted multimodal data using tools such as web search and image editing, thereby enabling continual learning over open-ended task distributions. Evaluated on Qwen3-VL-8B-Instruct and Qwen2.5-VL-7B-Instruct, DPE consistently improves performance across 11 benchmark datasets, demonstrating its effectiveness and scalability.

Technology Category

Application Category

📝 Abstract
As Large Multimodal Models (LMMs) scale up and reinforcement learning (RL) methods mature, LMMs have made notable progress in complex reasoning and decision making. Yet training still relies on static data and fixed recipes, making it difficult to diagnose capability blind spots or provide dynamic, targeted reinforcement. Motivated by findings that test driven error exposure and feedback based correction outperform repetitive practice, we propose Diagnostic-driven Progressive Evolution (DPE), a spiral loop where diagnosis steers data generation and reinforcement, and each iteration re-diagnoses the updated model to drive the next round of targeted improvement. DPE has two key components. First, multiple agents annotate and quality control massive unlabeled multimodal data, using tools such as web search and image editing to produce diverse, realistic samples. Second, DPE attributes failures to specific weaknesses, dynamically adjusts the data mixture, and guides agents to generate weakness focused data for targeted reinforcement. Experiments on Qwen3-VL-8B-Instruct and Qwen2.5-VL-7B-Instruct show stable, continual gains across eleven benchmarks, indicating DPE as a scalable paradigm for continual LMM training under open task distributions. Our code, models, and data are publicly available at https://github.com/hongruijia/DPE.
Problem

Research questions and friction points this paper is trying to address.

Large Multimodal Models
capability blind spots
diagnostic-driven training
dynamic reinforcement
continual improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diagnostic-driven Training
Iterative Reinforcement Learning
Multimodal Data Generation
Failure Attribution
Continual Model Improvement
🔎 Similar Papers
No similar papers found.