🤖 AI Summary
This study investigates the adverse effects of human curation on preference alignment in multi-foundation-model self-consumption training settings. It formalizes, for the first time, a multi-model self-consumption interaction framework to analyze scenarios where multiple models iteratively train on data generated by one another. By integrating dynamical systems modeling, multi-agent training analysis, and human feedback integration mechanisms, the work reveals how human curation—contrary to its assumed benefits in single-model contexts—can diminish or even reverse alignment outcomes in cross-model environments. The findings demonstrate that in multi-model systems, human curation does not necessarily improve long-term alignment performance; instead, inter-model interactions can induce alignment degradation, thereby challenging the conventional assumption that human feedback is inherently beneficial for alignment.
📝 Abstract
Foundation models are increasingly trained on synthetic data generated by prior model iterations rather than exclusively on real data. This self-consuming training paradigm can lead to model collapse, divergence, or bias amplification. Recent work (Ferbach et al., 2024) shows that incorporating human curation into the loop can steer a self-consuming model toward human-aligned behavior, but these analyses focus on a single, isolated model that solely consumes its own outputs. In practice, however, models often interact and train on input-output pairs produced by other models. This paper studies self-consuming training in the multi-model regime. We first formalize a framework for interacting self-consuming models and characterize when the resulting dynamical system converges to a stable point. We then examine how human curation of one model affects its own alignment (self-influence) and how such effects propagate to other models (cross-influence). Unlike isolated settings where human curation always enhances model alignment, we show that cross-model interactions can dampen or even invert this effect, ultimately degrading long-term alignment.