Proof-of-Perception: Certified Tool-Using Multimodal Reasoning with Compositional Conformal Guarantees

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the unreliability of multimodal reasoning, which often leads to error propagation and hallucination due to insufficient uncertainty calibration. The authors propose a novel executable reasoning graph framework that models perceptual and logical operations as nodes producing conformal prediction sets, thereby providing calibrated, stepwise uncertainty guarantees. A lightweight controller dynamically schedules tool invocations based on available computational budget. This approach establishes, for the first time, a compositional conformal guarantee mechanism that enables verifiable, evidence-backed reasoning, suppresses error accumulation, and allows controllable trade-offs between computation and accuracy. Experiments demonstrate consistent superiority over strong baselines across document, chart, and multi-image question-answering benchmarks in terms of performance, reliability, and computational efficiency.

Technology Category

Application Category

📝 Abstract
We present Proof-of-Perception (PoP), a tool-using framework that casts multimodal reasoning as an executable graph with explicit reliability guarantees. Each perception or logic node outputs a conformal set, yielding calibrated, stepwise uncertainty; a lightweight controller uses these certificates to allocate compute under a budget, expanding with extra tool calls only when needed and stopping early otherwise. This grounds answers in verifiable evidence, reduces error compounding and hallucinations, and enables principled accuracy-compute trade-offs. Across document, chart, and multi-image QA benchmarks, PoP improves performance and reliability over strong chain-of-thought, ReAct-style, and program-of-thought baselines while using computation more efficiently.
Problem

Research questions and friction points this paper is trying to address.

multimodal reasoning
reliability guarantees
error compounding
hallucinations
compute efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

conformal prediction
multimodal reasoning
tool-using framework
executable reasoning graph
compute-efficient inference
🔎 Similar Papers
No similar papers found.