Convergence Dynamics and Stabilization Strategies of Co-Evolving Generative Models

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates diversity collapse in co-evolving multimodal generative models (text and image), wherein each modality serves as training data for the other in a dynamic feedback loop. We identify and formally characterize the bidirectional collapse and Matthew effect induced by such co-evolution, establishing the first theoretical model and proving rigorously that unilateral freezing of either modality leads to exponential diversity contraction and accelerated degradation. To stabilize co-evolution, we propose an unsupervised external-corpus injection strategy—leveraging random or user-provided text—requiring neither annotations nor retraining. Our method integrates polynomial text modeling, conditional multivariate Gaussian image modeling, dynamical systems analysis, and convergence proofs. Synthetic experiments demonstrate that the approach simultaneously enhances both generation diversity and fidelity while effectively mitigating model collapse.

Technology Category

Application Category

📝 Abstract
The increasing prevalence of synthetic data in training loops has raised concerns about model collapse, where generative models degrade when trained on their own outputs. While prior work focuses on this self-consuming process, we study an underexplored yet prevalent phenomenon: co-evolving generative models that shape each other's training through iterative feedback. This is common in multimodal AI ecosystems, such as social media platforms, where text models generate captions that guide image models, and the resulting images influence the future adaptation of the text model. We take a first step by analyzing such a system, modeling the text model as a multinomial distribution and the image model as a conditional multi-dimensional Gaussian distribution. Our analysis uncovers three key results. First, when one model remains fixed, the other collapses: a frozen image model causes the text model to lose diversity, while a frozen text model leads to an exponential contraction of image diversity, though fidelity remains bounded. Second, in fully interactive systems, mutual reinforcement accelerates collapse, with image contraction amplifying text homogenization and vice versa, leading to a Matthew effect where dominant texts sustain higher image diversity while rarer texts collapse faster. Third, we analyze stabilization strategies implicitly introduced by real-world external influences. Random corpus injections for text models and user-content injections for image models prevent collapse while preserving both diversity and fidelity. Our theoretical findings are further validated through experiments.
Problem

Research questions and friction points this paper is trying to address.

Analyzes co-evolving generative models' mutual influence and collapse.
Explores stabilization strategies to prevent diversity loss in models.
Validates theoretical findings with experiments on multimodal AI systems.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes co-evolving generative models dynamics
Introduces random corpus injections for text
Uses user-content injections for image models
🔎 Similar Papers
No similar papers found.
Weiguo Gao
Weiguo Gao
Beijing University of Posts and Telecommunications
natural language processing
M
Ming Li
School of Mathematical Sciences, Fudan University, Shanghai, 200433, China