The Alignment Game: A Theory of Long-Horizon Alignment Through Recursive Curation

📅 2025-11-16

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This paper investigates the recursive evolution of user-preference alignment in self-consuming generative models—where models continuously retrain on their own outputs—transforming alignment from a one-shot optimization into a path-dependent, power-structured dynamic process. We propose a theoretical framework grounded in a two-stage Bradley–Terry model, integrating dynamic social choice theory, game-theoretic analysis, and path-dependence formalism to establish, for the first time, a rigorous foundation for long-term alignment under recursive training. We characterize three fundamental convergence regimes: consensus collapse, shared optimal compromise, and asymmetric refinement. Moreover, we prove an impossibility theorem demonstrating that diversity preservation, symmetric influence among users, and initial-condition independence are mutually incompatible under recursive alignment. These results shift AI alignment from a static objective paradigm to an evolutionary equilibrium paradigm, providing a novel theoretical basis for sustainable alignment.

Technology Category

Application Category

📝 Abstract

In self-consuming generative models that train on their own outputs, alignment with user preferences becomes a recursive rather than one-time process. We provide the first formal foundation for analyzing the long-term effects of such recursive retraining on alignment. Under a two-stage curation mechanism based on the Bradley-Terry (BT) model, we model alignment as an interaction between two factions: the Model Owner, who filters which outputs should be learned by the model, and the Public User, who determines which outputs are ultimately shared and retained through interactions with the model. Our analysis reveals three structural convergence regimes depending on the degree of preference alignment: consensus collapse, compromise on shared optima, and asymmetric refinement. We prove a fundamental impossibility theorem: no recursive BT-based curation mechanism can simultaneously preserve diversity, ensure symmetric influence, and eliminate dependence on initialization. Framing the process as dynamic social choice, we show that alignment is not a static goal but an evolving equilibrium, shaped both by power asymmetries and path dependence.

Problem

Research questions and friction points this paper is trying to address.

Analyzing long-term alignment effects in self-consuming generative models

Modeling alignment as interaction between Model Owner and Public User

Proving impossibility of preserving diversity with symmetric influence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Recursive curation mechanism for long-term alignment

Two-stage model with Bradley-Terry filtering process

Dynamic social choice framework for evolving equilibrium

🔎 Similar Papers

No similar papers found.