FaithAct: Faithfulness Planning and Acting in MLLMs

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Multimodal large language models (MLLMs) frequently exhibit behavioral infidelity—where reasoning steps contradict the final output—and perceptual infidelity—where reasoning diverges from visual inputs—leading to hallucinations and unstable inference. This paper is the first to explicitly distinguish these two fidelity dimensions and proposes FaithAct, a planning-and-action framework centered on evidence anchoring: it constrains each reasoning step via visual evidence, introduces stepwise and chain-level fidelity evaluation mechanisms, and establishes FaithEval—a quantitative, multi-granular benchmark for fidelity assessment. Evaluated across multiple multimodal reasoning benchmarks, FaithAct improves perceptual fidelity by up to 26% without sacrificing task accuracy, significantly mitigating hallucinations and enhancing inference trajectory stability. Key innovations include (1) a dichotomous fidelity modeling paradigm, (2) an evidence-driven reasoning control framework, and (3) a quantifiable, multi-granular fidelity evaluation system.

Technology Category

Application Category

📝 Abstract

Unfaithfulness remains a persistent challenge for large language models (LLMs), which often produce plausible yet ungrounded reasoning chains that diverge from perceptual evidence or final conclusions. We distinguish between behavioral faithfulness (alignment between reasoning and output) and perceptual faithfulness (alignment between reasoning and input), and introduce FaithEval for quantifying step-level and chain-level faithfulness by evaluating whether each claimed object is visually supported by the image. Building on these insights, we propose FaithAct, a faithfulness-first planning and acting framework that enforces evidential grounding at every reasoning step. Experiments across multiple reasoning benchmarks demonstrate that FaithAct improves perceptual faithfulness by up to 26% without degrading task accuracy compared to prompt-based and tool-augmented baselines. Our analysis shows that treating faithfulness as a guiding principle not only mitigates hallucination but also leads to more stable reasoning trajectories. This work thereby establishes a unified framework for both evaluating and enforcing faithfulness in multimodal reasoning.

Problem

Research questions and friction points this paper is trying to address.

Addresses unfaithful reasoning chains in multimodal large language models

Quantifies perceptual and behavioral faithfulness through step-level evaluation

Enforces evidential grounding at every reasoning step to reduce hallucinations

Innovation

Methods, ideas, or system contributions that make the work stand out.

FaithAct framework enforces evidential grounding in reasoning

FaithEval quantifies step-level and chain-level faithfulness

Faithfulness-first planning improves perceptual faithfulness by 26%

🔎 Similar Papers

How Ethical Should AI Be? How AI Alignment Shapes the Risk Preferences of LLMs