SAS-Prompt: Large Language Models as Numerical Optimizers for Robot Self-Improvement

📅 2025-04-29

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Traditional robot policy optimization relies on gradient-based learning or fine-tuning, limiting interpretability, generalizability, and applicability in low-data or real-world settings. Method: We propose a large language model (LLM)-based iterative self-improvement framework for robot policies—requiring neither gradients nor parameter fine-tuning. We first uncover LLMs’ intrinsic stochastic numerical optimization capability and design the SAS (Strategy–Action–Synthesis) prompting framework, which unifies policy reasoning, trajectory retrieval, feedback synthesis, and policy update within a single prompt. Integrated with robot-executed trajectory memory and iterative human- or environment-derived feedback, the framework enables autonomous policy evolution. Contribution/Results: Evaluated on simulated and real-world tabletop ping-pong tasks, our method significantly improves task success rate and cross-scenario behavioral generalization. Results empirically validate LLMs as effective, interpretable, and gradient-free universal policy optimizers.

Technology Category

Application Category

📝 Abstract

We demonstrate the ability of large language models (LLMs) to perform iterative self-improvement of robot policies. An important insight of this paper is that LLMs have a built-in ability to perform (stochastic) numerical optimization and that this property can be leveraged for explainable robot policy search. Based on this insight, we introduce the SAS Prompt (Summarize, Analyze, Synthesize) -- a single prompt that enables iterative learning and adaptation of robot behavior by combining the LLM's ability to retrieve, reason and optimize over previous robot traces in order to synthesize new, unseen behavior. Our approach can be regarded as an early example of a new family of explainable policy search methods that are entirely implemented within an LLM. We evaluate our approach both in simulation and on a real-robot table tennis task. Project website: sites.google.com/asu.edu/sas-llm/

Problem

Research questions and friction points this paper is trying to address.

LLMs perform iterative self-improvement of robot policies

LLMs enable explainable robot policy search via optimization

SAS Prompt combines reasoning to synthesize new robot behavior

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs perform iterative robot self-improvement

SAS Prompt enables explainable policy search

Combines retrieval, reasoning, and optimization

🔎 Similar Papers

No similar papers found.