DSCD-Nav: Dual-Stance Cooperative Debate for Object Navigation

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing indoor navigation systems that rely on single-score decisions in partially observable environments, which often suffer from overconfidence, leading to long-horizon errors and redundant exploration. To mitigate this, the authors propose a dual-stance collaborative debate mechanism that cross-validates decisions through two complementary perspectives: task-scene understanding and safety-information balance. An evidence-aware consensus arbitrator integrates arguments from both stances and, when necessary, triggers lightweight micro-probing to enhance reliability. The approach introduces, for the first time, dual-stance policy evaluation and cue-anchored argument generation, leveraging vision-language models for zero-shot perception. Evaluated on HM3Dv1, HM3Dv2, and MP3D benchmarks, the method significantly improves task success rates and path efficiency while reducing exploration redundancy.

Technology Category

Application Category

📝 Abstract
Adaptive navigation in unfamiliar indoor environments is crucial for household service robots. Despite advances in zero-shot perception and reasoning from vision-language models, existing navigation systems still rely on single-pass scoring at the decision layer, leading to overconfident long-horizon errors and redundant exploration. To tackle these problems, we propose Dual-Stance Cooperative Debate Navigation (DSCD-Nav), a decision mechanism that replaces one-shot scoring with stance-based cross-checking and evidence-aware arbitration to improve action reliability under partial observability. Specifically, given the same observation and candidate action set, we explicitly construct two stances by conditioning the evaluation on diverse and complementary objectives: a Task-Scene Understanding (TSU) stance that prioritizes goal progress from scene-layout cues, and a Safety-Information Balancing (SIB) stance that emphasizes risk and information value. The stances conduct a cooperative debate and make policy by cross-checking their top candidates with cue-grounded arguments. Then, a Navigation Consensus Arbitration (NCA) agent is employed to consolidate both sides'reasons and evidence, optionally triggering lightweight micro-probing to verify uncertain choices, preserving NCA's primary intent while disambiguating. Experiments on HM3Dv1, HM3Dv2, and MP3D demonstrate consistent improvements in success and path efficiency while reducing exploration redundancy.
Problem

Research questions and friction points this paper is trying to address.

object navigation
zero-shot perception
decision-making
partial observability
redundant exploration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Stance Debate
Cooperative Navigation
Evidence-Aware Arbitration
Partial Observability
Vision-Language Navigation
🔎 Similar Papers
W
Weitao An
School of Electronic Engineering, Xidian University
Q
Qi Liu
School of Electronic Engineering, Xidian University
Chenghao Xu
Chenghao Xu
EPFL
RoboticsDynamic SLAMActive Vision
J
Jiayi Chai
School of Electronic Engineering, Xidian University
X
Xu Yang
School of Electronic Engineering, Xidian University
Kun Wei
Kun Wei
School of Computer Science, Northwestern Polytechnical University
deep learningcompute sciencespeech
Cheng Deng
Cheng Deng
University of Edinburgh
On-device LLMNLPGeoAI