🤖 AI Summary
Existing robotic control approaches rely heavily on large-scale annotated datasets, suffering from limited generalizability and poor interpretability. This paper proposes a training-free embodied manipulation framework that enables end-to-end mapping from natural language instructions to robot trajectories via executable code as an intermediary representation. Leveraging large language models (LLMs) to generate interpretable control code—integrated with vision-language perception and motion planning modules—the framework accomplishes mobile manipulation tasks in open-world environments. It eliminates the need for fine-tuning or additional data collection, supports online geometric parameterization of novel objects and real-time trajectory generation, and thereby significantly enhances system transparency, cross-environment generalization, and adaptability to unseen objects and scenarios. Extensive experiments on a real mobile robot demonstrate stable performance across diverse, long-horizon manipulation tasks.
📝 Abstract
Recent advances in control robot methods, from end-to-end vision-language-action frameworks to modular systems with predefined primitives, have advanced robots'ability to follow natural language instructions. Nonetheless, many approaches still struggle to scale to diverse environments, as they often rely on large annotated datasets and offer limited interpretability.In this work, we introduce EmbodiedCoder, a training-free framework for open-world mobile robot manipulation that leverages coding models to directly generate executable robot trajectories. By grounding high-level instructions in code, EmbodiedCoder enables flexible object geometry parameterization and manipulation trajectory synthesis without additional data collection or fine-tuning.This coding-based paradigm provides a transparent and generalizable way to connect perception with manipulation. Experiments on real mobile robots show that EmbodiedCoder achieves robust performance across diverse long-term tasks and generalizes effectively to novel objects and environments.Our results demonstrate an interpretable approach for bridging high-level reasoning and low-level control, moving beyond fixed primitives toward versatile robot intelligence. See the project page at: https://anonymous.4open.science/w/Embodied-Coder/