SAS-Prompt: Large Language Models as Numerical Optimizers for Robot Self-Improvement

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional robot policy optimization relies on gradient-based learning or fine-tuning, limiting interpretability, generalizability, and applicability in low-data or real-world settings. Method: We propose a large language model (LLM)-based iterative self-improvement framework for robot policies—requiring neither gradients nor parameter fine-tuning. We first uncover LLMs’ intrinsic stochastic numerical optimization capability and design the SAS (Strategy–Action–Synthesis) prompting framework, which unifies policy reasoning, trajectory retrieval, feedback synthesis, and policy update within a single prompt. Integrated with robot-executed trajectory memory and iterative human- or environment-derived feedback, the framework enables autonomous policy evolution. Contribution/Results: Evaluated on simulated and real-world tabletop ping-pong tasks, our method significantly improves task success rate and cross-scenario behavioral generalization. Results empirically validate LLMs as effective, interpretable, and gradient-free universal policy optimizers.

Technology Category

Application Category

📝 Abstract
We demonstrate the ability of large language models (LLMs) to perform iterative self-improvement of robot policies. An important insight of this paper is that LLMs have a built-in ability to perform (stochastic) numerical optimization and that this property can be leveraged for explainable robot policy search. Based on this insight, we introduce the SAS Prompt (Summarize, Analyze, Synthesize) -- a single prompt that enables iterative learning and adaptation of robot behavior by combining the LLM's ability to retrieve, reason and optimize over previous robot traces in order to synthesize new, unseen behavior. Our approach can be regarded as an early example of a new family of explainable policy search methods that are entirely implemented within an LLM. We evaluate our approach both in simulation and on a real-robot table tennis task. Project website: sites.google.com/asu.edu/sas-llm/
Problem

Research questions and friction points this paper is trying to address.

LLMs perform iterative self-improvement of robot policies
LLMs enable explainable robot policy search via optimization
SAS Prompt combines reasoning to synthesize new robot behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs perform iterative robot self-improvement
SAS Prompt enables explainable policy search
Combines retrieval, reasoning, and optimization
🔎 Similar Papers
No similar papers found.
H
H. B. Amor
School of Computing and Augmented Intelligence, Arizona State University
L
L. Graesser
Google DeepMind
Atil Iscen
Atil Iscen
Google
RoboticsReinforcement LearningEvolutionary AlgorithmsMulti-agent Learning
D
David D'Ambrosio
Google DeepMind
Saminda Abeyruwan
Saminda Abeyruwan
University of Miami
Artificial IntelligenceAutonomous LearningMachine LearningReinforcement LearningSemantic Web
Alex Bewley
Alex Bewley
Google DeepMind
RoboticsMachine LearningComputer VisionVision Language Models
Y
Yifan Zhou
School of Computing and Augmented Intelligence, Arizona State University
K
Kamalesh Kalirathinam
School of Computing and Augmented Intelligence, Arizona State University
Swaroop Mishra
Swaroop Mishra
Research Scientist, Google DeepMind
Large Language ModelsNatural Language Processing
P
Pannag R. Sanketi
Google DeepMind