🤖 AI Summary
This work addresses semantic navigation in map-free, partially observable 3D environments. We propose an end-to-end reinforcement learning framework grounded in a learnable open-vocabulary 3D scene graph. Methodologically, our approach uniquely integrates: (i) dynamic scene graph neural networks to jointly model spatial and semantic relationships among objects; (ii) curriculum learning for progressive policy training; and (iii) imitation learning (behavioral cloning) for policy initialization. The navigation policy is optimized via Proximal Policy Optimization (PPO) within Isaac Sim, enabling joint optimization of semantic spatial reasoning and target localization. Experiments demonstrate substantial improvements in target navigation success rates across complex indoor scenes, along with strong generalization and cross-scene adaptability. The source code is publicly available.
📝 Abstract
The 3D scene graph models spatial relationships between objects, enabling the agent to efficiently navigate in a partially observable environment and predict the location of the target object.This paper proposes an original framework named SGN-CIRL (3D Scene Graph-Based Reinforcement Learning Navigation) for mapless reinforcement learning-based robot navigation with learnable representation of open-vocabulary 3D scene graph. To accelerate and stabilize the training of reinforcement learning-based algorithms, the framework also employs imitation learning and curriculum learning. The first one enables the agent to learn from demonstrations, while the second one structures the training process by gradually increasing task complexity from simple to more advanced scenarios. Numerical experiments conducted in the Isaac Sim environment showed that using a 3D scene graph for reinforcement learning significantly increased the success rate in difficult navigation cases. The code is open-sourced and available at: https://github.com/Xisonik/Aloha_graph.