QuASH: Using Natural-Language Heuristics to Query Visual-Language Robotic Maps

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the problem of precise grounding of natural-language queries to spatial regions in open-vocabulary vision–language maps. To this end, we propose a training-free heuristic semantic matching method. Our core innovation lies in leveraging lexical-semantic structures—specifically synonymy and antonymy relations—to construct a lightweight embedding-space guidance mechanism that jointly exploits vision–language model (VLM) features and semantic similarity metrics for efficient query-to-region mapping. The method is architecture-agnostic, supporting diverse visual encoders and text representations, thereby ensuring strong cross-domain generalization. Extensive experiments on multiple vision-map and image benchmarks demonstrate significant improvements in cross-scene query accuracy, outperforming existing baselines under zero-shot and few-shot settings. Importantly, the approach yields interpretable, low-dependency localization—requiring no large-scale supervised training—thus establishing a novel paradigm for semantic navigation and instruction-driven robotic systems.

Technology Category

Application Category

📝 Abstract

Embeddings from Visual-Language Models are increasingly utilized to represent semantics in robotic maps, offering an open-vocabulary scene understanding that surpasses traditional, limited labels. Embeddings enable on-demand querying by comparing embedded user text prompts to map embeddings via a similarity metric. The key challenge in performing the task indicated in a query is that the robot must determine the parts of the environment relevant to the query. This paper proposes a solution to this challenge. We leverage natural-language synonyms and antonyms associated with the query within the embedding space, applying heuristics to estimate the language space relevant to the query, and use that to train a classifier to partition the environment into matches and non-matches. We evaluate our method through extensive experiments, querying both maps and standard image benchmarks. The results demonstrate increased queryability of maps and images. Our querying technique is agnostic to the representation and encoder used, and requires limited training.

Problem

Research questions and friction points this paper is trying to address.

Identifying environment parts relevant to natural language queries

Leveraging synonyms and antonyms in embedding space

Training classifiers to partition environments for query matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses synonyms and antonyms in embedding space

Applies heuristics to estimate relevant language space

Trains classifier to partition environment into matches

🔎 Similar Papers

No similar papers found.