Bimanual Robot Manipulation via Multi-Agent In-Context Learning

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenge of few-shot bimanual robot manipulation, where standard in-context learning struggles due to the high-dimensional joint action space and tight coordination constraints, often saturating the context window. The authors propose BiCICLe, a framework that formulates bimanual control as a multi-agent leader-follower problem, decoupling the action space by conditioning each arm’s policy on the other. It introduces an “arm debate” mechanism combined with an LLM-as-Judge strategy to iteratively refine coordinated trajectories. Notably, BiCICLe achieves the first successful application of off-the-shelf large language models—without fine-tuning—to few-shot bimanual control, attaining an average success rate of 71.1% across 13 tasks in the TWIN benchmark. This result surpasses the best training-free baseline by 6.7 percentage points and outperforms most supervised methods, demonstrating exceptional generalization to novel tasks.

Technology Category

Application Category

📝 Abstract

Language Models (LLMs) have emerged as powerful reasoning engines for embodied control. In particular, In-Context Learning (ICL) enables off-the-shelf, text-only LLMs to predict robot actions without any task-specific training while preserving their generalization capabilities. Applying ICL to bimanual manipulation remains challenging, as the high-dimensional joint action space and tight inter-arm coordination constraints rapidly overwhelm standard context windows. To address this, we introduce BiCICLe (Bimanual Coordinated In-Context Learning), the first framework that enables standard LLMs to perform few-shot bimanual manipulation without fine-tuning. BiCICLe frames bimanual control as a multi-agent leader-follower problem, decoupling the action space into sequential, conditioned single-arm predictions. This naturally extends to Arms' Debate, an iterative refinement process, and to the introduction of a third LLM-as-Judge to evaluate and select the most plausible coordinated trajectories. Evaluated on 13 tasks from the TWIN benchmark, BiCICLe achieves up to 71.1% average success rate, outperforming the best training-free baseline by 6.7 percentage points and surpassing most supervised methods. We further demonstrate strong few-shot generalization on novel tasks.

Problem

Research questions and friction points this paper is trying to address.

bimanual manipulation

in-context learning

large language models

multi-agent coordination

embodied control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bimanual Manipulation

In-Context Learning

Multi-Agent LLM