When Does Model Collapse Occur in Structured Interactive Learning?

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study addresses the problem of "model collapse"—a degradation in performance arising when multiple models interactively learn from synthetic data generated by one another. By formalizing inter-model interactions as a directed graph, the work establishes, for the first time, necessary and sufficient conditions for model collapse in multi-model settings, thereby extending beyond prior analyses limited to single-model self-training. The theoretical framework integrates directed graph topology, finite-sample analysis of linear regression, and asymptotic theory of M-estimators to rigorously characterize the collapse mechanism. Extensive numerical experiments validate the theoretical findings and uncover an intrinsic relationship between the structure of the interaction graph and the extent of performance degradation across models.

📝 Abstract

The proliferation of generative artificial intelligence has given rise to an interactive learning environment, where model parameters are continuously updated using not only data generated by natural processes, but also synthetic outputs produced by other models. This paradigm introduces two major challenges: (1) training data are no longer drawn exclusively from the target population, undermining a core assumption of classical statistical learning, and (2) model training processes become inherently correlated, as models interact with one another through repeated exposure to each other's synthetic outputs in a potentially complex manner. Establishing reliable statistical inference in such structured interactive learning environments therefore remains an important open problem. In particular, there is growing concern about model collapse, a phenomenon in which the performance of generative models progressively degrades as they are trained on synthetic data produced by earlier model generations. Prior work on model collapse primarily focuses on a single model trained on its own output, failing to capture model performance in multi-model interactive settings. In this work, we fill this gap by investigating the performance of generative models in an interactive learning environment with general interaction patterns. In particular, we formalize model interactions using directed graphs and show that the occurrence of model collapse depends critically on the topology of the interaction graph. We further derive an explicit necessary and sufficient condition characterizing when model collapse occurs, and establish finite-sample results for linear regression and asymptotic guarantees for general M-estimators. We support our theoretical findings through extensive numerical experiments.

Problem

Research questions and friction points this paper is trying to address.

model collapse

interactive learning

generative models

synthetic data

interaction topology

Innovation

Methods, ideas, or system contributions that make the work stand out.

model collapse

interactive learning

interaction graph