Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the challenge of ambiguous user queries that fail to uniquely specify a target, a scenario in which existing methods are prone to interference from visually similar distractors, often leading to premature decisions or requiring lengthy descriptions. To overcome this, the authors propose ProCompNav, a two-stage active navigation framework that first constructs a pool of candidate instances and then employs a discriminative policy to generate binary yes/no questions designed to maximally partition the candidate set. Through iterative pruning of inconsistent candidates, disambiguation is reframed as an interactive, pool-level questioning process, eschewing reliance on detailed attribute descriptions of individual instances. Experiments demonstrate that ProCompNav achieves state-of-the-art performance on TextNav and surpasses both interactive and non-interactive baselines on CoIN-Bench, attaining higher accuracy with significantly fewer user inputs and shorter response lengths.

📝 Abstract

Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collecting multiple candidates, ask about the target's attributes derived from individual candidates rather than questions selected to distinguish candidates in the pool. As a result, despite the dialogue, the agent may still fail to distinguish the target from distractors, leading to premature decisions and lengthy user responses. We propose Proactive Instance Navigation with Comparative Judgment (ProCompNav), a two-stage framework that first constructs a candidate pool and then identifies the target through comparative judgment. At each round, ProCompNav extracts an attribute-value pair that splits the current pool, asks a binary yes/no question, and prunes all inconsistent candidates at once. This reframes disambiguation from open-ended target description to pool-level discriminative questioning, where each question is chosen to narrow the candidate set. On CoIN-Bench, ProCompNav improves Success Rate over interactive baselines with the same minimal input and non-interactive baselines with detailed descriptions, while substantially reducing Response Length. ProCompNav also achieves state-of-the-art Success Rate on TextNav, suggesting that comparative judgment is broadly useful for instance-level navigation among similar distractors.

Problem

Research questions and friction points this paper is trying to address.

instance navigation

ambiguous queries

disambiguation

comparative judgment

candidate discrimination

Innovation

Methods, ideas, or system contributions that make the work stand out.

comparative judgment

instance navigation

candidate pool pruning