Failure is Feedback: History-Aware Backtracking for Agentic Traversal in Multimodal Graphs

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing open-domain multimodal document retrieval methods rely on a unified similarity metric, lacking adaptability to semantic shifts and the ability to recover from failed retrieval paths. This work formulates subgraph retrieval as a sequential decision-making process and introduces a history-aware backtracking mechanism that leverages contextual information from prior failed attempts. Furthermore, it devises a cost-aware agent traversal strategy that invokes expensive large language model inference only when necessary. By integrating multimodal graph traversal, dynamic backtracking, and economically rational scheduling, the proposed approach achieves state-of-the-art retrieval performance on the MultimodalQA, MMCoQA, and WebQA benchmarks.

Technology Category

Application Category

📝 Abstract

Open-domain multimodal document retrieval aims to retrieve specific components (paragraphs, tables, or images) from large and interconnected document corpora. Existing graph-based retrieval approaches typically rely on a uniform similarity metric that overlooks hop-specific semantics, and their rigid pre-defined plans hinder dynamic error correction. These limitations suggest that a retriever should adapt its reasoning to the evolving context and recover intelligently from dead ends. To address these needs, we propose Failure is Feedback (FiF), which casts subgraph retrieval as a sequential decision process and introduces two key innovations. (i) We introduce a history-aware backtracking mechanism; unlike standard backtracking that simply reverts the state, our approach piggybacks on the context of failed traversals, leveraging insights from previous failures. (ii) We implement an economically-rational agentic workflow. Unlike conventional agents with static strategies, our orchestrator employs a cost-aware traversal method to dynamically manage the trade-off between retrieval accuracy and inference costs, escalating to intensive LLM-based reasoning only when the prior failure justifies the additional computational investment. Extensive experiments show that FiF achieves state-of-the-art retrieval on the benchmarks of MultimodalQA, MMCoQA and WebQA.

Problem

Research questions and friction points this paper is trying to address.

multimodal retrieval

graph-based retrieval

history-aware backtracking

agentic traversal

open-domain document retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

history-aware backtracking

agentic traversal

cost-aware reasoning