ALRM: Agentic LLM for Robotic Manipulation

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current large language models (LLMs) in robotic manipulation, particularly their lack of modular execution mechanisms and the absence of systematic benchmarks supporting multi-step reasoning and linguistic diversity. To this end, we propose the ALRM framework, which integrates a ReAct-style reasoning loop and enables language-driven closed-loop planning and interpretable control through two paradigms: Code-as-Policy and Tool-as-Policy. We introduce the first modular agent architecture capable of reflection and correction, alongside a novel simulation benchmark encompassing 56 tasks across multiple environments and languages. Extensive experiments across 10 LLMs demonstrate that ALRM substantially enhances multi-step manipulation performance, with Claude-4.1-Opus (closed-source) and Falcon-H1-7B (open-source) achieving the best results.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have recently empowered agentic frameworks to exhibit advanced reasoning and planning capabilities. However, their integration in robotic control pipelines remains limited in two aspects: (1) prior \ac{llm}-based approaches often lack modular, agentic execution mechanisms, limiting their ability to plan, reflect on outcomes, and revise actions in a closed-loop manner; and (2) existing benchmarks for manipulation tasks focus on low-level control and do not systematically evaluate multistep reasoning and linguistic variation. In this paper, we propose Agentic LLM for Robot Manipulation (ALRM), an LLM-driven agentic framework for robotic manipulation. ALRM integrates policy generation with agentic execution through a ReAct-style reasoning loop, supporting two complementary modes: Code-asPolicy (CaP) for direct executable control code generation, and Tool-as-Policy (TaP) for iterative planning and tool-based action execution. To enable systematic evaluation, we also introduce a novel simulation benchmark comprising 56 tasks across multiple environments, capturing linguistically diverse instructions. Experiments with ten LLMs demonstrate that ALRM provides a scalable, interpretable, and modular approach for bridging natural language reasoning with reliable robotic execution. Results reveal Claude-4.1-Opus as the top closed-source model and Falcon-H1-7B as the top open-source model under CaP.
Problem

Research questions and friction points this paper is trying to address.

robotic manipulation
large language models
agentic frameworks
multistep reasoning
linguistic variation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic LLM
Robotic Manipulation
ReAct-style reasoning
Code-as-Policy
Tool-as-Policy
🔎 Similar Papers
No similar papers found.
V
Vitor Gaboardi dos Santos
Technology Innovation Institute, Abu Dhabi, UAE; Dublin City University, Dublin, Ireland
Ibrahim Khadraoui
Ibrahim Khadraoui
Research engineer , Technology Innovation Institute
Low latency video streamingInternet of senses
Ibrahim Farhat
Ibrahim Farhat
Researcher @TII
AI compression3D reconstructionVideo codingVideo Streaming
H
Hamza Yous
Technology Innovation Institute, Abu Dhabi, UAE
S
Samy Teffahi
Technology Innovation Institute, Abu Dhabi, UAE
Hakim Hacid
Hakim Hacid
Technology Innovation Institute (TII), UAE
Machine LearningLLMDatabasesInformation RetrievalEdge ML