In-Context Iterative Policy Improvement for Dynamic Manipulation

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the low-sample decision-making challenge in dynamic manipulation tasks—characterized by high-dimensional state spaces, complex dynamics, and partial observability. We propose the first iterative policy optimization framework leveraging in-context learning of large language models (LLMs), eliminating the need for explicit parameter updates or fine-tuning. Instead, the method harnesses LLMs’ few-shot generalization capability: given minimal interaction data, natural-language prompts guide iterative refinement of a parameterized policy. The framework integrates an attention-based LLM, a neural policy network, and reinforcement learning signals. We validate its efficacy in both simulation and on real robotic platforms. Experiments demonstrate substantial improvements over conventional reinforcement learning and imitation learning baselines across multiple dynamic manipulation tasks, with superior cross-task generalization and online adaptation capability.

Technology Category

Application Category

📝 Abstract

Attention-based architectures trained on internet-scale language data have demonstrated state of the art reasoning ability for various language-based tasks, such as logic problems and textual reasoning. Additionally, these Large Language Models (LLMs) have exhibited the ability to perform few-shot prediction via in-context learning, in which input-output examples provided in the prompt are generalized to new inputs. This ability furthermore extends beyond standard language tasks, enabling few-shot learning for general patterns. In this work, we consider the application of in-context learning with pre-trained language models for dynamic manipulation. Dynamic manipulation introduces several crucial challenges, including increased dimensionality, complex dynamics, and partial observability. To address this, we take an iterative approach, and formulate our in-context learning problem to predict adjustments to a parametric policy based on previous interactions. We show across several tasks in simulation and on a physical robot that utilizing in-context learning outperforms alternative methods in the low data regime. Video summary of this work and experiments can be found https://youtu.be/2inxpdrq74U?si=dAdDYsUEr25nZvRn.

Problem

Research questions and friction points this paper is trying to address.

Applying in-context learning to dynamic manipulation tasks

Addressing high dimensionality and complex dynamics challenges

Improving parametric policies through iterative prediction adjustments

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-context learning with pre-trained language models

Iterative approach for parametric policy adjustments

Few-shot prediction for dynamic manipulation challenges

🔎 Similar Papers

Collaborative motion planning for multi-manipulator systems through Reinforcement Learning and Dynamic Movement Primitives