In-Context Iterative Policy Improvement for Dynamic Manipulation

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low-sample decision-making challenge in dynamic manipulation tasks—characterized by high-dimensional state spaces, complex dynamics, and partial observability. We propose the first iterative policy optimization framework leveraging in-context learning of large language models (LLMs), eliminating the need for explicit parameter updates or fine-tuning. Instead, the method harnesses LLMs’ few-shot generalization capability: given minimal interaction data, natural-language prompts guide iterative refinement of a parameterized policy. The framework integrates an attention-based LLM, a neural policy network, and reinforcement learning signals. We validate its efficacy in both simulation and on real robotic platforms. Experiments demonstrate substantial improvements over conventional reinforcement learning and imitation learning baselines across multiple dynamic manipulation tasks, with superior cross-task generalization and online adaptation capability.

Technology Category

Application Category

📝 Abstract
Attention-based architectures trained on internet-scale language data have demonstrated state of the art reasoning ability for various language-based tasks, such as logic problems and textual reasoning. Additionally, these Large Language Models (LLMs) have exhibited the ability to perform few-shot prediction via in-context learning, in which input-output examples provided in the prompt are generalized to new inputs. This ability furthermore extends beyond standard language tasks, enabling few-shot learning for general patterns. In this work, we consider the application of in-context learning with pre-trained language models for dynamic manipulation. Dynamic manipulation introduces several crucial challenges, including increased dimensionality, complex dynamics, and partial observability. To address this, we take an iterative approach, and formulate our in-context learning problem to predict adjustments to a parametric policy based on previous interactions. We show across several tasks in simulation and on a physical robot that utilizing in-context learning outperforms alternative methods in the low data regime. Video summary of this work and experiments can be found https://youtu.be/2inxpdrq74U?si=dAdDYsUEr25nZvRn.
Problem

Research questions and friction points this paper is trying to address.

Applying in-context learning to dynamic manipulation tasks
Addressing high dimensionality and complex dynamics challenges
Improving parametric policies through iterative prediction adjustments
Innovation

Methods, ideas, or system contributions that make the work stand out.

In-context learning with pre-trained language models
Iterative approach for parametric policy adjustments
Few-shot prediction for dynamic manipulation challenges
🔎 Similar Papers
No similar papers found.