Searching in Space and Time: Unified Memory-Action Loops for Open-World Object Retrieval

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Service robots operating in open, dynamic environments must jointly reason over spatial (e.g., “the cup on the table”) and temporal (e.g., “the cup that was here yesterday”) cues to retrieve objects—yet existing approaches are limited to static scene graphs, lack embodied interaction capability, or assume closed-vocabulary recognition. This paper introduces STAR, the first framework unifying spatiotemporal object search. STAR integrates non-parametric long-term memory with working memory to support open-vocabulary grounding and continual learning; it further leverages vision-language models to drive embodied actions, enabling joint spatial-temporal reasoning and active interaction within dynamic scene graphs. Evaluated on the STARBench benchmark and real-world experiments with a Tiago robot, STAR significantly outperforms scene-graph-based and memory-only baselines, substantially improving object retrieval performance in open-world settings.

Technology Category

Application Category

📝 Abstract
Service robots must retrieve objects in dynamic, open-world settings where requests may reference attributes ("the red mug"), spatial context ("the mug on the table"), or past states ("the mug that was here yesterday"). Existing approaches capture only parts of this problem: scene graphs capture spatial relations but ignore temporal grounding, temporal reasoning methods model dynamics but do not support embodied interaction, and dynamic scene graphs handle both but remain closed-world with fixed vocabularies. We present STAR (SpatioTemporal Active Retrieval), a framework that unifies memory queries and embodied actions within a single decision loop. STAR leverages non-parametric long-term memory and a working memory to support efficient recall, and uses a vision-language model to select either temporal or spatial actions at each step. We introduce STARBench, a benchmark of spatiotemporal object search tasks across simulated and real environments. Experiments in STARBench and on a Tiago robot show that STAR consistently outperforms scene-graph and memory-only baselines, demonstrating the benefits of treating search in time and search in space as a unified problem.
Problem

Research questions and friction points this paper is trying to address.

Retrieving objects using spatial, temporal, and attribute references in open-world settings
Unifying memory queries and embodied actions for spatiotemporal object search
Overcoming limitations of scene graphs and temporal reasoning in dynamic environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified memory-action loops for open-world retrieval
Leveraging non-parametric long-term memory with working memory
Vision-language model selecting temporal or spatial actions
🔎 Similar Papers
No similar papers found.