ActionReasoning: Robot Action Reasoning in 3D Space with LLM for Robotic Brick Stacking

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Traditional robotic systems rely on handcrafted planners with limited generalization capabilities, while existing vision-language-action approaches struggle in continuous action spaces due to the representational constraints of language tokens. This work proposes ActionReasoning, a novel framework that, for the first time, structures the physical priors and commonsense knowledge embedded in large language models (LLMs) into a multi-agent collaborative mechanism to enable explicit action reasoning grounded in physical laws. The method generates executable manipulation plans that adhere to real-world dynamics, achieving stable block stacking in tasks with known precise states. By establishing a generalizable bridge from high-level semantic instructions to low-level motor execution, ActionReasoning significantly reduces reliance on domain-specific code while maintaining robust performance.

Technology Category

Application Category

📝 Abstract

Classical robotic systems typically rely on custom planners designed for constrained environments. While effective in restricted settings, these systems lack generalization capabilities, limiting the scalability of embodied AI and general-purpose robots. Recent data-driven Vision-Language-Action (VLA) approaches aim to learn policies from large-scale simulation and real-world data. However, the continuous action space of the physical world significantly exceeds the representational capacity of linguistic tokens, making it unclear if scaling data alone can yield general robotic intelligence. To address this gap, we propose ActionReasoning, an LLM-driven framework that performs explicit action reasoning to produce physics-consistent, prior-guided decisions for robotic manipulation. ActionReasoning leverages the physical priors and real-world knowledge already encoded in Large Language Models (LLMs) and structures them within a multi-agent architecture. We instantiate this framework on a tractable case study of brick stacking, where the environment states are assumed to be already accurately measured. The environmental states are then serialized and passed to a multi-agent LLM framework that generates physics-aware action plans. The experiments demonstrate that the proposed multi-agent LLM framework enables stable brick placement while shifting effort from low-level domain-specific coding to high-level tool invocation and prompting, highlighting its potential for broader generalization. This work introduces a promising approach to bridging perception and execution in robotic manipulation by integrating physical reasoning with LLMs.

Problem

Research questions and friction points this paper is trying to address.

robotic manipulation

action reasoning

large language models

generalization

3D space

Innovation

Methods, ideas, or system contributions that make the work stand out.

ActionReasoning

LLM-driven reasoning

physical priors