🤖 AI Summary
This study addresses the limited task planning capability of service robots in complex domestic environments. We propose a novel architecture centered on large language models (LLMs) as the cognitive engine. Methodologically, the approach integrates pretrained foundation models, domain-specific fine-tuning, retrieval-augmented generation (RAG), and multimodal prompt engineering to enable semantic understanding, high-level task decomposition, and autonomous decision-making from textual, visual, and auditory inputs. Our key contribution is an end-to-end LLM-driven planning framework that eliminates reliance on explicit symbolic modeling—characteristic of traditional planners—and systematically identifies current technical bottlenecks and evolutionary pathways. Experimental evaluation and comprehensive review demonstrate substantial improvements in task generalization and environmental adaptability. The work establishes a scalable technical paradigm for AI-robotics integration and delineates concrete directions for future advancement. (149 words)
📝 Abstract
With the rapid advancement of large language models (LLMs) and robotics, service robots are increasingly becoming an integral part of daily life, offering a wide range of services in complex environments. To deliver these services intelligently and efficiently, robust and accurate task planning capabilities are essential. This paper presents a comprehensive overview of the integration of LLMs into service robotics, with a particular focus on their role in enhancing robotic task planning. First, the development and foundational techniques of LLMs, including pre-training, fine-tuning, retrieval-augmented generation (RAG), and prompt engineering, are reviewed. We then explore the application of LLMs as the cognitive core-`brain'-of service robots, discussing how LLMs contribute to improved autonomy and decision-making. Furthermore, recent advancements in LLM-driven task planning across various input modalities are analyzed, including text, visual, audio, and multimodal inputs. Finally, we summarize key challenges and limitations in current research and propose future directions to advance the task planning capabilities of service robots in complex, unstructured domestic environments. This review aims to serve as a valuable reference for researchers and practitioners in the fields of artificial intelligence and robotics.