🤖 AI Summary
Existing platforms struggle to support the coexistence and fair comparison of reinforcement learning agents, large language models, vision-language models, and human decision-makers within a unified environment. To address this gap, this work proposes an open-source, visualization-first platform that enables plug-and-play integration of heterogeneous agents through inter-process communication (IPC) isolation, a unified action interface abstraction, and deterministic simulation control. The platform supports both manual, fine-grained observation and scripted automated evaluation, ensuring reproducibility across diverse agent paradigms. This study presents the first framework to achieve seamless collaboration and equitable benchmarking among multiple agent types in a shared environment, thereby establishing a reliable infrastructure for hybrid multi-agent research.
📝 Abstract
Reinforcement learning (RL), large language models (LLMs), and vision-language models (VLMs) have been widely studied in isolation. However, existing infrastructure lacks the ability to deploy agents from different decision-making paradigms within the same environment, making it difficult to study them in hybrid multi-agent settings or to compare their behaviour fairly under identical conditions. We present MOSAIC, an open-source platform that bridges this gap by incorporating a diverse set of existing reinforcement learning environments and enabling heterogeneous agents (RL policies, LLMs, VLMs, and human players) to operate within them in ad-hoc team settings with reproducible results. MOSAIC introduces three contributions. (i) An IPC-based worker protocol that wraps both native and third-party frameworks as isolated subprocess workers, each executing its native training and inference logic unmodified, communicating through a versioned inter-process protocol. (ii) An operator abstraction that forms an agent-level interface by mapping workers to agents: each operator, regardless of whether it is backed by an RL policy, an LLM, or a human, conforms to a minimal unified interface. (iii) A deterministic cross-paradigm evaluation framework offering two complementary modes: a manual mode that advances up to N concurrent operators in lock-step under shared seeds for fine-grained visual inspection of behavioural differences, and a script mode that drives automated, long-running evaluation through declarative Python scripts, for reproducible experiments. We release MOSAIC as an open, visual-first platform to facilitate reproducible cross-paradigm research across the RL, LLM, and human-in-the-loop communities.