🤖 AI Summary
Existing spatial semantic representations struggle to effectively reason about structured temporal dynamics—such as the periodic movement of household objects—in semi-static environments. This work proposes PredictiveGraphs, a predictive 3D scene graph that integrates spatiotemporal and semantic information by embedding Perpetua* Bayesian filters directly into inter-node relationships, enabling temporal modeling and future prediction of object states. By jointly modeling spatiotemporal-semantic relations and performing recurrent state inference, the approach maintains robustness under distributional shifts. Evaluated over three-week navigation tasks in both simulation and real-world settings—with environmental changes occurring every two hours—the method significantly outperforms current baselines in accurately forecasting the dynamic evolution of the environment.
📝 Abstract
We have seen tremendous recent progress in our ability to build "spatio-semantic" representations that enable robots to perform complex reasoning across geometry and semantics. However, the vast majority of these methods lack any ability to perform reasoning across time. This is a desirable property in situations where a robot repeatedly observes an environment where instances may change in between observations, but in a structured way. Consider as an example a home environment where the location of a mug typically moves from the cupboard to a countertop to the sink and then back to the cupboard on a daily basis. We should be able to learn this cyclic behavior and use it to predict the state of the mug in the future. In this work, we propose a method that is able to perform this type of tempo-spatio-semantic reasoning. Underpinning the method is a filter, Perpetua$^*$, that performs Bayesian reasoning on the states of the environment that are observed over time. This filter is integrated within a 3D scene graph structure that we call PredictiveGraphs, where nodes represent objects and edges function as Perpetua$^*$ filters encoding spatio-semantic relationships. We validate the method in both simulation and real-world dynamic navigation tasks, where our real world experiments consist of an environment that is undergoing semi-static changes at a bi-hourly frequency over a period of three weeks. In both settings, we demonstrate that our method outperforms baselines in predicting future environment states, even in the presence of distributional shifts.