🤖 AI Summary
Existing research evaluates the faithfulness of Chain-of-Thought (CoT) reasoning through two disjoint paradigms—contextual and parametric—lacking a systematic understanding of their interplay. This work proposes FaithMate, a unified preference alignment framework that establishes, for the first time, an integrated interface for joint optimization and evaluation. By leveraging techniques such as preference alignment, input/reasoning perturbations, and parameter interventions, the study systematically investigates the coupling characteristics and generalization capabilities of both faithfulness types across multiple models and datasets. The findings reveal an asymmetric coupling and inherent trade-off: optimizing parametric faithfulness consistently enhances performance in both paradigms, whereas contextual optimization yields unstable gains with poor transferability. These results challenge the conventional assumption that faithfulness is a singular objective, underscoring the necessity of multidimensional co-optimization and evaluation for CoT faithfulness.
📝 Abstract
Chain-of-Thought (CoT) faithfulness, i.e., whether CoTs genuinely reflect large language models' (LLM) underlying behavior, is typically evaluated under two disjoint paradigms: contextual faithfulness, measured by perturbing the input or CoT trace, and parametric faithfulness, assessed by intervening on a model's parametric knowledge. Yet prior work compares them only descriptively. We fill this gap by proposing FaithMate, a unified preference-alignment interface for optimizing models towards either faithfulness paradigm. It enables us to investigate the interplay between the two paradigms, examining whether and to what extent faithfulness gains generalize within and across paradigms. Across three models, two datasets, and six faithfulness metrics, we find that the two paradigms are positively coupled, yet asymmetric: optimizing towards parametric faithfulness yields consistent gains across both paradigms, whereas the contextual counterpart delivers more variable gains. Within the contextual paradigm, faithfulness gains on one metric do not consistently transfer to others, implying that existing contextual metrics capture disjoint facets of faithfulness and exposing inherent trade-offs. These findings imply that CoT faithfulness is not a monolithic objective and therefore requires multifaceted optimization and evaluation.