LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots

📅 2024-04-22
🏛️ arXiv.org
📈 Citations: 3
Influential: 1
📄 PDF
🤖 AI Summary
Household service robots struggle to align with users’ personalized preferences due to partial observability and preference dynamics. Method: This paper proposes a scene-graph-driven iterative planning and optimization framework: (1) it constructs a multimodal scene graph for partial-observability-aware environment modeling; (2) it introduces a two-stage personalization alignment mechanism—first bootstrapping via imitation learning, then refining via policy-gradient-enhanced reinforcement-based self-training to enable continual adaptation from few-shot demonstrations; (3) it end-to-end integrates perception, planning, and execution within the Housekeep 3D simulation environment. Results: Experiments on the Housekeep benchmark show >30% improvement in task success rate over prior methods, significantly outperforming existing LLM-based planners. The framework achieves, for the first time, robust and evolvable alignment with fine-grained human preferences.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have shown significant potential for robotics applications, particularly task planning, by harnessing their language comprehension and text generation capabilities. However, in applications such as household robotics, a critical gap remains in the personalization of these models to individual user preferences. We introduce LLM-Personalize, a novel framework with an optimization pipeline designed to personalize LLM planners for household robotics. Our LLM-Personalize framework features an LLM planner that performs iterative planning in multi-room, partially-observable household scenarios, making use of a scene graph constructed with local observations. The generated plan consists of a sequence of high-level actions which are subsequently executed by a controller. Central to our approach is the optimization pipeline, which combines imitation learning and iterative self-training to personalize the LLM planner. In particular, the imitation learning phase performs initial LLM alignment from demonstrations, and bootstraps the model to facilitate effective iterative self-training, which further explores and aligns the model to user preferences. We evaluate LLM-Personalize on Housekeep, a challenging simulated real-world 3D benchmark for household rearrangements, and show that LLM-Personalize achieves more than a 30 percent increase in success rate over existing LLM planners, showcasing significantly improved alignment with human preferences. Project page: https://gdg94.github.io/projectllmpersonalize/.
Problem

Research questions and friction points this paper is trying to address.

Personalization
Home Robots
User Preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-Personalize
Advanced Motion Planning
User Preference Adaptation
Dongge Han
Dongge Han
Microsoft
LLMsRecommender SystemsReinforcement LearningMultiagent SystemsGame Theory
T
Trevor A. McInroe
School of Informatics, University of Edinburgh, Edinburgh, UK
Adam Jelley
Adam Jelley
University of Edinburgh
machine learningreinforcement learningrepresentation learning
Stefano V. Albrecht
Stefano V. Albrecht
School of Informatics, University of Edinburgh
Artificial IntelligenceAutonomous AgentsMulti-Agent SystemsReinforcement Learning
P
Peter Bell
School of Informatics, University of Edinburgh, Edinburgh, UK
A
A. Storkey
School of Informatics, University of Edinburgh, Edinburgh, UK