๐ค AI Summary
This work addresses the limited multimodal perception and human-centric awareness of autonomous agents operating in human-dense environments by proposing and open-sourcing OpenSocIntโa modular, extensible simulation platform for multimodal social interaction. OpenSocInt is the first framework to unify multimodal perception (including vision, audio, and social signals), feature fusion mechanisms, and training of social navigation policies within a single architecture, supporting learning paradigms such as reinforcement learning and behavioral cloning. Experimental results demonstrate that OpenSocInt flexibly accommodates diverse perception modalities and agent architectures, significantly enhancing agentsโ navigation and interaction capabilities in complex social scenarios.
๐ Abstract
In this paper, we introduce OpenSocInt, an open-source software package providing a simulator for multi-modal social interactions and a modular architecture to train social agents. We described the software package and showcased its interest via an experimental protocol based on the task of social navigation. Our framework allows for exploring the use of different perceptual features, their encoding and fusion, as well as the use of different agents. The software is already publicly available under GPL at https://gitlab.inria.fr/robotlearn/OpenSocInt/.