Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit “sycophancy” during reasoning—uncritically accepting and reinforcing erroneous user claims due to excessive user alignment. Method: We formulate sycophancy suppression as an uncertainty-aware adaptive reasoning trajectory optimization problem. We propose a joint reward mechanism that jointly supervises stepwise progress and final outcomes, and introduce Uncertainty-Aware Monte Carlo Tree Search (UA-MCTS) to guide reinforcement learning fine-tuning—without requiring additional annotated data. UA-MCTS dynamically adjusts exploration to detect and correct user belief biases. Contribution/Results: Our approach significantly reduces sycophancy rates while preserving strong out-of-distribution generalization performance. Empirical results demonstrate that optimizing the reasoning process—not just the output—enhances factual consistency. The method is fully data-efficient, leveraging only the LLM’s internal uncertainty estimates to steer reasoning toward truthfulness and robustness.

Technology Category

Application Category

📝 Abstract
Despite the remarkable capabilities of large language models, current training paradigms inadvertently foster extit{sycophancy}, i.e., the tendency of a model to agree with or reinforce user-provided information even when it's factually incorrect. To address this challenge, we introduce extbf{SMART} (Sycophancy Mitigation through Adaptive Reasoning Trajectories), which reframes sycophancy as a extit{reasoning optimization problem} rather than an output alignment issue. SMART is a two-stage framework comprising: (1) Uncertainty-Aware Adaptive Monte Carlo Tree Search (UA-MCTS), which dynamically adjusts model exploration based on state-level uncertainty to collect high-quality, diverse reasoning trajectories alongside both stepwise progress and final outcome rewards; and (2) progress-based reinforcement learning, which fine-tunes the model using the collected trajectories and reward signals to reinforce effective reasoning patterns. Through extensive experiments, we show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs and maintaining general capabilities. These results underscore the importance of optimizing internal reasoning mechanisms to build more truthful and aligned AI assistants.
Problem

Research questions and friction points this paper is trying to address.

Mitigating sycophancy in language models that agree with incorrect user information
Reframing sycophancy as reasoning optimization rather than output alignment
Developing uncertainty-aware reinforcement learning to improve reasoning patterns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty-Aware Adaptive Monte Carlo Tree Search
Progress-based reinforcement learning fine-tuning
Reframing sycophancy as reasoning optimization problem
🔎 Similar Papers
No similar papers found.
M
Mohammad Beigi
University of California, Davis
Y
Ying Shen
University of Illinois Urbana-Champaign
P
Parshin Shojaee
Virginia Tech
Q
Qifan Wang
Meta AI
Zichao Wang
Zichao Wang
Adobe Research
document AIAI for educationnatural language processingmachine learning
C
Chandan Reddy
Virginia Tech
M
Ming Jin
Virginia Tech
Lifu Huang
Lifu Huang
Assistant Professor, UC Davis
Natural Language ProcessingMultimodal LearningAI for ScienceMultilingual