What happens when generative AI models train recursively on each others' generated outputs?

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

How does inter-model recursive training—where generative AI models serve as mutual sources of training data—affect performance evolution? This study presents the first systematic empirical investigation of cross-modal recursive training dynamics between multilingual language and image generation models using synthetic data. Through multi-stage simulations, theoretical modeling, and large-scale experiments, we propose a data-mediated model interaction framework. We find that cross-model data injection expands conceptual coverage but concurrently induces task-performance convergence and output homogenization; thus, performance gains and degradation co-occur as a novel evolutionary mechanism. Our work reveals an intrinsic tension within the “generate–retrain” feedback loop of AI data ecosystems, exposing critical trade-offs in model sustainability. It provides both theoretical foundations and empirical warnings for responsible model evolution and data governance.

Technology Category

Application Category

📝 Abstract

The internet is full of AI-generated content while also serving as a common source of training data for generative AI (genAI) models. This duality raises the possibility that future genAI models may be trained on other models' generated outputs. Prior work has studied consequences of models training on their own generated outputs, but limited work has considered what happens if models ingest content produced by other models. Given society's increasing dependence on genAI tools, understanding downstream effects of such data-mediated model interactions is critical. To this end, we provide empirical evidence for how data-mediated interactions might unfold in practice, develop a theoretical model for this interactive training process, and show experimentally possible long-term results of such interactions. We find that data-mediated interactions can benefit models by exposing them to novel concepts perhaps missed in original training data, but also can homogenize their performance on shared tasks.

Problem

Research questions and friction points this paper is trying to address.

Impact of generative AI models training on each others' outputs

Effects of AI-generated content on future model performance

Long-term consequences of data-mediated interactions between AI models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Empirical study on data-mediated AI model interactions

Theoretical model for recursive AI training process

Analysis of long-term effects on model performance

🔎 Similar Papers

On the Challenges and Opportunities in Generative AI