FaithAct: Faithfulness Planning and Acting in MLLMs

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal large language models (MLLMs) frequently exhibit behavioral infidelity—where reasoning steps contradict the final output—and perceptual infidelity—where reasoning diverges from visual inputs—leading to hallucinations and unstable inference. This paper is the first to explicitly distinguish these two fidelity dimensions and proposes FaithAct, a planning-and-action framework centered on evidence anchoring: it constrains each reasoning step via visual evidence, introduces stepwise and chain-level fidelity evaluation mechanisms, and establishes FaithEval—a quantitative, multi-granular benchmark for fidelity assessment. Evaluated across multiple multimodal reasoning benchmarks, FaithAct improves perceptual fidelity by up to 26% without sacrificing task accuracy, significantly mitigating hallucinations and enhancing inference trajectory stability. Key innovations include (1) a dichotomous fidelity modeling paradigm, (2) an evidence-driven reasoning control framework, and (3) a quantifiable, multi-granular fidelity evaluation system.

Technology Category

Application Category

📝 Abstract
Unfaithfulness remains a persistent challenge for large language models (LLMs), which often produce plausible yet ungrounded reasoning chains that diverge from perceptual evidence or final conclusions. We distinguish between behavioral faithfulness (alignment between reasoning and output) and perceptual faithfulness (alignment between reasoning and input), and introduce FaithEval for quantifying step-level and chain-level faithfulness by evaluating whether each claimed object is visually supported by the image. Building on these insights, we propose FaithAct, a faithfulness-first planning and acting framework that enforces evidential grounding at every reasoning step. Experiments across multiple reasoning benchmarks demonstrate that FaithAct improves perceptual faithfulness by up to 26% without degrading task accuracy compared to prompt-based and tool-augmented baselines. Our analysis shows that treating faithfulness as a guiding principle not only mitigates hallucination but also leads to more stable reasoning trajectories. This work thereby establishes a unified framework for both evaluating and enforcing faithfulness in multimodal reasoning.
Problem

Research questions and friction points this paper is trying to address.

Addresses unfaithful reasoning chains in multimodal large language models
Quantifies perceptual and behavioral faithfulness through step-level evaluation
Enforces evidential grounding at every reasoning step to reduce hallucinations
Innovation

Methods, ideas, or system contributions that make the work stand out.

FaithAct framework enforces evidential grounding in reasoning
FaithEval quantifies step-level and chain-level faithfulness
Faithfulness-first planning improves perceptual faithfulness by 26%
🔎 Similar Papers
No similar papers found.
Junxian Li
Junxian Li
NSEC lab,Shanghai Jiaotong University
AI securityReasoningData Mining
X
Xinyue Xu
The Hong Kong University of Science and Technology
Sai Ma
Sai Ma
Federal Reserve Board of Governors
Macro FinanceAsset Pricing
S
Sichao Li
City University of Macau