Proof-of-Perception: Certified Tool-Using Multimodal Reasoning with Compositional Conformal Guarantees

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the unreliability of multimodal reasoning, which often leads to error propagation and hallucination due to insufficient uncertainty calibration. The authors propose a novel executable reasoning graph framework that models perceptual and logical operations as nodes producing conformal prediction sets, thereby providing calibrated, stepwise uncertainty guarantees. A lightweight controller dynamically schedules tool invocations based on available computational budget. This approach establishes, for the first time, a compositional conformal guarantee mechanism that enables verifiable, evidence-backed reasoning, suppresses error accumulation, and allows controllable trade-offs between computation and accuracy. Experiments demonstrate consistent superiority over strong baselines across document, chart, and multi-image question-answering benchmarks in terms of performance, reliability, and computational efficiency.

Technology Category

Application Category

📝 Abstract

We present Proof-of-Perception (PoP), a tool-using framework that casts multimodal reasoning as an executable graph with explicit reliability guarantees. Each perception or logic node outputs a conformal set, yielding calibrated, stepwise uncertainty; a lightweight controller uses these certificates to allocate compute under a budget, expanding with extra tool calls only when needed and stopping early otherwise. This grounds answers in verifiable evidence, reduces error compounding and hallucinations, and enables principled accuracy-compute trade-offs. Across document, chart, and multi-image QA benchmarks, PoP improves performance and reliability over strong chain-of-thought, ReAct-style, and program-of-thought baselines while using computation more efficiently.

Problem

Research questions and friction points this paper is trying to address.

multimodal reasoning

reliability guarantees

error compounding

hallucinations

compute efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

conformal prediction

multimodal reasoning

tool-using framework