LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This work addresses the challenges of long-term object management and high-level command execution for embodied robots in home environments. We propose a fine-tuning-free, LLM-driven multi-agent collaborative architecture comprising three specialized agents: routing, task planning, and knowledge base. Leveraging in-context learning and retrieval-augmented generation (RAG), the architecture enables memory-enhanced task planning, cross-turn object state tracking, and semantic scene understanding. The end-to-end embodied intelligence system integrates multimodal foundation models—including Grounded SAM, LLaMA3.2-Vision, Qwen2.5, and LLaMA3.1—to jointly perceive, reason, and act. Experiments across three household task categories demonstrate significant improvements in planning accuracy; RAG substantially enhances long-term memory recall; and Qwen2.5 and LLaMA3.1 achieve superior performance in planning and routing, respectively. To our knowledge, this is the first memory-augmented multi-agent paradigm tailored to domestic settings, offering a scalable, fine-tuning-free architectural pathway for embodied AI.

Technology Category

Application Category

📝 Abstract

We present an embodied robotic system with an LLM-driven agent-orchestration architecture for autonomous household object management. The system integrates memory-augmented task planning, enabling robots to execute high-level user commands while tracking past actions. It employs three specialized agents: a routing agent, a task planning agent, and a knowledge base agent, each powered by task-specific LLMs. By leveraging in-context learning, our system avoids the need for explicit model training. RAG enables the system to retrieve context from past interactions, enhancing long-term object tracking. A combination of Grounded SAM and LLaMa3.2-Vision provides robust object detection, facilitating semantic scene understanding for task planning. Evaluation across three household scenarios demonstrates high task planning accuracy and an improvement in memory recall due to RAG. Specifically, Qwen2.5 yields best performance for specialized agents, while LLaMA3.1 excels in routing tasks. The source code is available at: https://github.com/marc1198/chat-hsr.

Problem

Research questions and friction points this paper is trying to address.

Enables autonomous household object management using LLM-driven agents

Integrates memory-augmented task planning for tracking past actions

Combines vision and language models for robust object detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven agent-orchestration architecture for household robotics

Memory-augmented task planning with specialized LLM agents

Grounded SAM and LLaMa3.2-Vision for object detection

🔎 Similar Papers

InteLiPlan: Interactive Lightweight LLM-Based Planner for Domestic Robot Autonomy