VLN-NF: Feasibility-Aware Vision-and-Language Navigation with False-Premise Instructions

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses a critical limitation in existing Vision-and-Language Navigation (VLN) systems, which assume that the target specified in an instruction always exists and thus struggle with instructions based on false premises. The study presents the first systematic investigation of such scenarios, introducing the VLN-NF benchmark that requires agents to actively explore environments and explicitly determine when a target is “NOT-FOUND.” To tackle this challenge, the authors propose ROAM, a method combining supervised room-level navigation with fine-grained exploration driven by large language models (LLMs) or vision-language models (VLMs) leveraging free-space priors. Key contributions include a scalable data generation pipeline, a novel evaluation metric (REV-SPL), and a two-stage hybrid navigation strategy. Experiments demonstrate that ROAM significantly outperforms baseline methods on VLN-NF, which often fail due to insufficient exploration and consequent misjudgment.

Technology Category

Application Category

📝 Abstract

Conventional Vision-and-Language Navigation (VLN) benchmarks assume instructions are feasible and the referenced target exists, leaving agents ill-equipped to handle false-premise goals. We introduce VLN-NF, a benchmark with false-premise instructions where the target is absent from the specified room and agents must navigate, gather evidence through in-room exploration, and explicitly output NOT-FOUND. VLN-NF is constructed via a scalable pipeline that rewrites VLN instructions using an LLM and verifies target absence with a VLM, producing plausible yet factually incorrect goals. We further propose REV-SPL to jointly evaluate room reaching, exploration coverage, and decision correctness. To address this challenge, we present ROAM, a two-stage hybrid that combines supervised room-level navigation with LLM/VLM-driven in-room exploration guided by a free-space clearance prior. ROAM achieves the best REV-SPL among compared methods, while baselines often under-explore and terminate prematurely under unreliable instructions. VLN-NF project page can be found at https://vln-nf.github.io/.

Problem

Research questions and friction points this paper is trying to address.

Vision-and-Language Navigation

False-Premise Instructions

NOT-FOUND Detection

Feasibility Awareness

In-Room Exploration

Innovation

Methods, ideas, or system contributions that make the work stand out.

False-Premise Navigation

Vision-and-Language Navigation

LLM-VLM Integration