Improving Cooperation in Collaborative Embodied AI

📅 2025-10-03

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study addresses key challenges in collaborative embodied AI—namely, low multi-agent cooperation efficiency, insufficient LLM-driven communication and reasoning, and poor naturalness of human–machine interaction—by proposing an enhanced CoELA framework. Methodologically, it introduces a hierarchical prompt engineering strategy tailored for task coordination, integrating role-aware agent instructions, shared memory mechanisms, and dynamic context management; notably, it pioneers end-to-end speech interaction (ASR/TTS) to enable real-time, voice-driven collaborative decision-making. Experiments atop open-source LLMs (e.g., Gemma-3) demonstrate a 22% improvement in task completion efficiency over the original CoELA, alongside显著 gains in user immersion and system iteration speed. Core contributions include: (1) a scalable, LLM-based multi-agent prompting paradigm; (2) a speech-augmented embodied collaboration architecture; and (3) lightweight deployment validation in realistic scenarios.

Technology Category

Application Category

📝 Abstract

The integration of Large Language Models (LLMs) into multiagent systems has opened new possibilities for collaborative reasoning and cooperation with AI agents. This paper explores different prompting methods and evaluates their effectiveness in enhancing agent collaborative behaviour and decision-making. We enhance CoELA, a framework designed for building Collaborative Embodied Agents that leverage LLMs for multi-agent communication, reasoning, and task coordination in shared virtual spaces. Through systematic experimentation, we examine different LLMs and prompt engineering strategies to identify optimised combinations that maximise collaboration performance. Furthermore, we extend our research by integrating speech capabilities, enabling seamless collaborative voice-based interactions. Our findings highlight the effectiveness of prompt optimisation in enhancing collaborative agent performance; for example, our best combination improved the efficiency of the system running with Gemma3 by 22% compared to the original CoELA system. In addition, the speech integration provides a more engaging user interface for iterative system development and demonstrations.

Problem

Research questions and friction points this paper is trying to address.

Enhancing collaborative behavior in multi-agent AI systems through prompting

Optimizing LLM integration for embodied agent communication and coordination

Improving task efficiency in shared virtual environments using speech capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced CoELA framework for collaborative embodied agents

Optimized prompting methods to improve multi-agent cooperation

Integrated speech capabilities for voice-based agent interactions

🔎 Similar Papers

Enabling Multi-Robot Collaboration from Single-Human Guidance