AgenticLab: A Real-World Robot Agent Platform that Can See, Think, and Act

πŸ“… 2026-02-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing vision-language model (VLM)-based robotic approaches struggle to achieve long-horizon closed-loop execution in real-world unstructured environments and lack a unified, reproducible evaluation benchmark. This work proposes AgenticLabβ€”a model-agnostic, real-world robotic agent platform that integrates real-time perception, task decomposition, online verification, and dynamic replanning into a closed-loop architecture. For the first time, it establishes a reproducible benchmark for evaluating VLM agents on open-world manipulation tasks. Through systematic evaluation of state-of-the-art VLM agents on physical robots, the study identifies critical failure modes including multi-step inconsistency, object localization under occlusion, and spatial reasoning errors. The complete software and hardware stack is open-sourced to advance research on general-purpose robotic agents.

Technology Category

Application Category

πŸ“ Abstract
Recent advances in large vision-language models (VLMs) have demonstrated generalizable open-vocabulary perception and reasoning, yet their real-robot manipulation capability remains unclear for long-horizon, closed-loop execution in unstructured, in-the-wild environments. Prior VLM-based manipulation pipelines are difficult to compare across different research groups'setups, and many evaluations rely on simulation, privileged state, or specially designed setups. We present AgenticLab, a model-agnostic robot agent platform and benchmark for open-world manipulation. AgenticLab provides a closed-loop agent pipeline for perception, task decomposition, online verification, and replanning. Using AgenticLab, we benchmark state-of-the-art VLM-based agents on real-robot tasks in unstructured environments. Our benchmark reveals several failure modes that offline vision-language tests (e.g., VQA and static image understanding) fail to capture, including breakdowns in multi-step grounding consistency, object grounding under occlusion and scene changes, and insufficient spatial reasoning for reliable manipulation. We will release the full hardware and software stack to support reproducible evaluation and accelerate research on general-purpose robot agents.
Problem

Research questions and friction points this paper is trying to address.

robot manipulation
vision-language models
real-world evaluation
open-world environments
closed-loop execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

robot agent
vision-language models
closed-loop manipulation
open-world benchmark
real-world robotics
πŸ”Ž Similar Papers
No similar papers found.