🤖 AI Summary
This work addresses the limitations of existing indoor navigation systems that rely on single-score decisions in partially observable environments, which often suffer from overconfidence, leading to long-horizon errors and redundant exploration. To mitigate this, the authors propose a dual-stance collaborative debate mechanism that cross-validates decisions through two complementary perspectives: task-scene understanding and safety-information balance. An evidence-aware consensus arbitrator integrates arguments from both stances and, when necessary, triggers lightweight micro-probing to enhance reliability. The approach introduces, for the first time, dual-stance policy evaluation and cue-anchored argument generation, leveraging vision-language models for zero-shot perception. Evaluated on HM3Dv1, HM3Dv2, and MP3D benchmarks, the method significantly improves task success rates and path efficiency while reducing exploration redundancy.
📝 Abstract
Adaptive navigation in unfamiliar indoor environments is crucial for household service robots. Despite advances in zero-shot perception and reasoning from vision-language models, existing navigation systems still rely on single-pass scoring at the decision layer, leading to overconfident long-horizon errors and redundant exploration. To tackle these problems, we propose Dual-Stance Cooperative Debate Navigation (DSCD-Nav), a decision mechanism that replaces one-shot scoring with stance-based cross-checking and evidence-aware arbitration to improve action reliability under partial observability. Specifically, given the same observation and candidate action set, we explicitly construct two stances by conditioning the evaluation on diverse and complementary objectives: a Task-Scene Understanding (TSU) stance that prioritizes goal progress from scene-layout cues, and a Safety-Information Balancing (SIB) stance that emphasizes risk and information value. The stances conduct a cooperative debate and make policy by cross-checking their top candidates with cue-grounded arguments. Then, a Navigation Consensus Arbitration (NCA) agent is employed to consolidate both sides'reasons and evidence, optionally triggering lightweight micro-probing to verify uncertain choices, preserving NCA's primary intent while disambiguating. Experiments on HM3Dv1, HM3Dv2, and MP3D demonstrate consistent improvements in success and path efficiency while reducing exploration redundancy.