ALRM: Agentic LLM for Robotic Manipulation

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the limitations of current large language models (LLMs) in robotic manipulation, particularly their lack of modular execution mechanisms and the absence of systematic benchmarks supporting multi-step reasoning and linguistic diversity. To this end, we propose the ALRM framework, which integrates a ReAct-style reasoning loop and enables language-driven closed-loop planning and interpretable control through two paradigms: Code-as-Policy and Tool-as-Policy. We introduce the first modular agent architecture capable of reflection and correction, alongside a novel simulation benchmark encompassing 56 tasks across multiple environments and languages. Extensive experiments across 10 LLMs demonstrate that ALRM substantially enhances multi-step manipulation performance, with Claude-4.1-Opus (closed-source) and Falcon-H1-7B (open-source) achieving the best results.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have recently empowered agentic frameworks to exhibit advanced reasoning and planning capabilities. However, their integration in robotic control pipelines remains limited in two aspects: (1) prior \ac{llm}-based approaches often lack modular, agentic execution mechanisms, limiting their ability to plan, reflect on outcomes, and revise actions in a closed-loop manner; and (2) existing benchmarks for manipulation tasks focus on low-level control and do not systematically evaluate multistep reasoning and linguistic variation. In this paper, we propose Agentic LLM for Robot Manipulation (ALRM), an LLM-driven agentic framework for robotic manipulation. ALRM integrates policy generation with agentic execution through a ReAct-style reasoning loop, supporting two complementary modes: Code-asPolicy (CaP) for direct executable control code generation, and Tool-as-Policy (TaP) for iterative planning and tool-based action execution. To enable systematic evaluation, we also introduce a novel simulation benchmark comprising 56 tasks across multiple environments, capturing linguistically diverse instructions. Experiments with ten LLMs demonstrate that ALRM provides a scalable, interpretable, and modular approach for bridging natural language reasoning with reliable robotic execution. Results reveal Claude-4.1-Opus as the top closed-source model and Falcon-H1-7B as the top open-source model under CaP.

Problem

Research questions and friction points this paper is trying to address.

robotic manipulation

large language models

agentic frameworks

multistep reasoning

linguistic variation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic LLM

Robotic Manipulation

ReAct-style reasoning