LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of long-term object management and high-level command execution for embodied robots in home environments. We propose a fine-tuning-free, LLM-driven multi-agent collaborative architecture comprising three specialized agents: routing, task planning, and knowledge base. Leveraging in-context learning and retrieval-augmented generation (RAG), the architecture enables memory-enhanced task planning, cross-turn object state tracking, and semantic scene understanding. The end-to-end embodied intelligence system integrates multimodal foundation models—including Grounded SAM, LLaMA3.2-Vision, Qwen2.5, and LLaMA3.1—to jointly perceive, reason, and act. Experiments across three household task categories demonstrate significant improvements in planning accuracy; RAG substantially enhances long-term memory recall; and Qwen2.5 and LLaMA3.1 achieve superior performance in planning and routing, respectively. To our knowledge, this is the first memory-augmented multi-agent paradigm tailored to domestic settings, offering a scalable, fine-tuning-free architectural pathway for embodied AI.

Technology Category

Application Category

📝 Abstract
We present an embodied robotic system with an LLM-driven agent-orchestration architecture for autonomous household object management. The system integrates memory-augmented task planning, enabling robots to execute high-level user commands while tracking past actions. It employs three specialized agents: a routing agent, a task planning agent, and a knowledge base agent, each powered by task-specific LLMs. By leveraging in-context learning, our system avoids the need for explicit model training. RAG enables the system to retrieve context from past interactions, enhancing long-term object tracking. A combination of Grounded SAM and LLaMa3.2-Vision provides robust object detection, facilitating semantic scene understanding for task planning. Evaluation across three household scenarios demonstrates high task planning accuracy and an improvement in memory recall due to RAG. Specifically, Qwen2.5 yields best performance for specialized agents, while LLaMA3.1 excels in routing tasks. The source code is available at: https://github.com/marc1198/chat-hsr.
Problem

Research questions and friction points this paper is trying to address.

Enables autonomous household object management using LLM-driven agents
Integrates memory-augmented task planning for tracking past actions
Combines vision and language models for robust object detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven agent-orchestration architecture for household robotics
Memory-augmented task planning with specialized LLM agents
Grounded SAM and LLaMa3.2-Vision for object detection
🔎 Similar Papers
No similar papers found.
M
Marc Glocker
AIT Austrian Institute of Technology GmbH, Center for Vision, Automation and Control, 1210 Vienna, Austria
P
Peter Honig
Automation and Control Institute, Faculty of Electrical Engineering, TU Wien, 1040 Vienna, Austria
Matthias Hirschmanner
Matthias Hirschmanner
TU Wien
Markus Vincze
Markus Vincze
TU Wien
Robot visionhome roboticsmaking robots see