🤖 AI Summary
To address the relocalization challenge posed by long-term pose changes of objects induced by human activities in dynamic environments, this paper proposes FlowMaps—the first approach to introduce flow matching into dynamic object relocalization. FlowMaps models the spatiotemporal patterns of human–object interactions and integrates long-horizon robotic observation sequences to construct a multimodal spatiotemporal generative model, enabling robust prediction of object latent distributions. Unlike conventional methods that assume static or short-term dynamics, FlowMaps explicitly captures persistent, activity-driven object displacement across extended time scales. Experiments demonstrate significant improvements in relocalization accuracy and generalization across complex real-world settings—including homes and warehouses—while substantially reducing task failure rates. Ablation studies validate the efficacy of each design component, and comprehensive benchmarks confirm state-of-the-art performance. The implementation is publicly available to foster reproducibility and further research.
📝 Abstract
Task and motion planning are long-standing challenges in robotics, especially when robots have to deal with dynamic environments exhibiting long-term dynamics, such as households or warehouses. In these environments, long-term dynamics mostly stem from human activities, since previously detected objects can be moved or removed from the scene. This adds the necessity to find such objects again before completing the designed task, increasing the risk of failure due to missed relocalizations. However, in these settings, the nature of such human-object interactions is often overlooked, despite being governed by common habits and repetitive patterns. Our conjecture is that these cues can be exploited to recover the most likely objects' positions in the scene, helping to address the problem of unknown relocalization in changing environments. To this end we propose FlowMaps, a model based on Flow Matching that is able to infer multimodal object locations over space and time. Our results present statistical evidence to support our hypotheses, opening the way to more complex applications of our approach. The code is publically available at https://github.com/Fra-Tsuna/flowmaps