Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the susceptibility of large language models to self-consuming performative loops (SCPL) during iterative training on self-generated data, which exacerbates preference bias and undermines group representativeness. The study presents the first systematic characterization and analysis of the dynamic evolution of bias within SCPL, uncovering the counterintuitive phenomenon wherein preference bias intensifies while disparity bias diminishes. To mitigate this issue, the authors propose a reward-driven rejection sampling strategy grounded in a controllable feedback mechanism, effectively alleviating bias under both full retraining and incremental fine-tuning settings. Empirical evaluations across three real-world tasks demonstrate that the proposed approach significantly reduces the bias risks introduced by synthetic data and enhances the trustworthiness of self-improving systems.

Technology Category

Application Category

📝 Abstract
The rapid advancement of large language models (LLMs) has led to growing interest in using synthetic data to train future models. However, this creates a self-consuming retraining loop, where models are trained on their own outputs and may cause performance drops and induce emerging biases. In real-world applications, previously deployed LLMs may influence the data they generate, leading to a dynamic system driven by user feedback. For example, if a model continues to underserve users from a group, less query data will be collected from this particular demographic of users. In this study, we introduce the concept of \textbf{S}elf-\textbf{C}onsuming \textbf{P}erformative \textbf{L}oop (\textbf{SCPL}) and investigate the role of synthetic data in shaping bias during these dynamic iterative training processes under controlled performative feedback. This controlled setting is motivated by the inaccessibility of real-world user preference data from dynamic production systems, and enables us to isolate and analyze feedback-driven bias evolution in a principled manner. We focus on two types of loops, including the typical retraining setting and the incremental fine-tuning setting, which is largely underexplored. Through experiments on three real-world tasks, we find that the performative loop increases preference bias and decreases disparate bias. We design a reward-based rejection sampling strategy to mitigate the bias, moving towards more trustworthy self-improving systems.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Synthetic Data
Bias
Performative Feedback
Self-Consuming Loop
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Consuming Performative Loop
synthetic data bias
performative feedback
reward-based rejection sampling
incremental fine-tuning
🔎 Similar Papers
No similar papers found.