Objective Decoupling in Social Reinforcement Learning: Recovering Ground Truth from Sycophantic Majorities

πŸ“… 2026-02-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In social reinforcement learning, human feedback is often distorted by flattery, laziness, or adversarial intent, undermining conventional alignment approaches. This work proposes Epistemic Source Alignment (ESA), a novel method that formalizes the problem of objective decoupling and introduces a β€œjudge-the-judges” mechanism to assess the credibility of feedback sources through sparse safety axioms, rather than relying on majority consensus. Theoretically, ESA is proven to converge to the true objective; empirically, it recovers the optimal policy even when biased evaluators constitute the majority, significantly outperforming existing methods.

Technology Category

Application Category

πŸ“ Abstract
Contemporary AI alignment strategies rely on a fragile premise: that human feedback, while noisy, remains a fundamentally truthful signal. In this paper, we identify this assumption as Dogma 4 of Reinforcement Learning (RL). We demonstrate that while this dogma holds in static environments, it fails in social settings where evaluators may be sycophantic, lazy, or adversarial. We prove that under Dogma 4, standard RL agents suffer from what we call Objective Decoupling, a structural failure mode where the agent's learned objective permanently separates from the latent ground truth, guaranteeing convergence to misalignment. To resolve this, we propose Epistemic Source Alignment (ESA). Unlike standard robust methods that rely on statistical consensus (trusting the majority), ESA utilizes sparse safety axioms to judge the source of the feedback rather than the signal itself. We prove that this"judging the judges"mechanism guarantees convergence to the true objective, even when a majority of evaluators are biased. Empirically, we show that while traditional consensus methods fail under majority collusion, our approach successfully recovers the optimal policy.
Problem

Research questions and friction points this paper is trying to address.

Objective Decoupling
Social Reinforcement Learning
AI Alignment
Sycophantic Feedback
Ground Truth Recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Objective Decoupling
Epistemic Source Alignment
Social Reinforcement Learning
AI Alignment
Sycophantic Feedback
πŸ”Ž Similar Papers
No similar papers found.