🤖 AI Summary
Existing graph-based navigation methods suffer from structural error accumulation due to the absence of semantic guidance and susceptibility to erroneous predictions, compromising long-term reliability. This work proposes the Hypothesis Graph Refinement (HGR) framework, which models frontier semantic predictions as revisable hypothesis nodes and constructs a dependency-aware graph memory to support goal-directed exploration. Upon detecting conflicts between observations and predictions, HGR triggers a verification-driven cascaded error-correction mechanism that dynamically prunes incorrect subgraphs, enabling the navigation graph to contract rather than grow unidirectionally. The approach integrates vision-language models for contextual semantic prediction and introduces an exploration ranking strategy that jointly considers goal relevance, path cost, and uncertainty. Evaluated on GOAT-Bench, HGR achieves a 72.41% success rate (SPL 56.22%), substantially improving A-EQA and EM-EQA performance, while reducing redundant nodes by 20% and decreasing revisit rates in erroneous regions by 4.5×.
📝 Abstract
Embodied agents must explore partially observed environments while maintaining reliable long-horizon memory. Existing graph-based navigation systems improve scalability, but they often treat unexplored regions as semantically unknown, leading to inefficient frontier search. Although vision-language models (VLMs) can predict frontier semantics, erroneous predictions may be embedded into memory and propagate through downstream inferences, causing structural error accumulation that confidence attenuation alone cannot resolve. These observations call for a framework that can leverage semantic predictions for directed exploration while systematically retracting errors once new evidence contradicts them. We propose Hypothesis Graph Refinement (HGR), a framework that represents frontier predictions as revisable hypothesis nodes in a dependency-aware graph memory. HGR introduces (1) semantic hypothesis module, which estimates context-conditioned semantic distributions over frontiers and ranks exploration targets by goal relevance, travel cost, and uncertainty, and (2) verification-driven cascade correction, which compares on-site observations against predicted semantics and, upon mismatch, retracts the refuted node together with all its downstream dependents. Unlike additive map-building, this allows the graph to contract by pruning erroneous subgraphs, keeping memory reliable throughout long episodes. We evaluate HGR on multimodal lifelong navigation (GOAT-Bench) and embodied question answering (A-EQA, EM-EQA). HGR achieves 72.41% success rate and 56.22% SPL on GOAT-Bench, and shows consistent improvements on both QA benchmarks. Diagnostic analysis reveals that cascade correction eliminates approximately 20% of structurally redundant hypothesis nodes and reduces revisits to erroneous regions by 4.5x, with specular and transparent surfaces accounting for 67% of corrected prediction errors.