Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Existing AI agents are constrained by static, predefined action spaces—such as APIs, GUI interactions, or robotic commands—limiting their adaptability to dynamically varying interaction granularities in open-world environments. This paper introduces CrossAgent, the first unified agent framework enabling adaptive selection and seamless switching across heterogeneous action levels (API, GUI, and robot-level commands). Its core contributions are: (1) joint modeling of multi-level heterogeneous action spaces with a dynamic gating mechanism for context-aware action selection; and (2) Multi-Turn Group Relative Policy Optimization (GRPO), a rule-free reinforcement learning algorithm that optimizes policy groups over multiple turns. The method employs a two-stage training paradigm—cold-start supervised fine-tuning followed by GRPO—and leverages large-scale open-world trajectory data collected in Minecraft. Evaluated on 800+ diverse tasks, CrossAgent achieves state-of-the-art performance, significantly outperforming fixed-action baselines, with marked improvements in long-horizon reasoning, generalization, and execution efficiency.

Technology Category

Application Category

📝 Abstract

The paradigm of agentic AI is shifting from engineered complex workflows to post-training native models. However, existing agents are typically confined to static, predefined action spaces--such as exclusively using APIs, GUI events, or robotic commands. This rigidity limits their adaptability in dynamic environments where the optimal granularity of interaction varies contextually. To bridge this gap, we propose CrossAgent, a unified agentic model that masters heterogeneous action spaces and autonomously selects the most effective interface for each step of a trajectory. We introduce a comprehensive training pipeline that integrates cold-start supervised fine-tuning with a Multi-Turn Group Relative Policy Optimization (GRPO) algorithm. This approach enables the agent to learn adaptive action switching--balancing high-level efficiency with low-level precision--without human-specified rules. Extensive experiments on over 800 tasks in the open-world Minecraft environment demonstrate that CrossAgent achieves state-of-the-art performance. By dynamically leveraging the strengths of diverse action spaces, our model significantly outperforms fixed-action baselines, exhibiting superior generalization and efficiency in long-horizon reasoning. All code and models are available at https://github.com/CraftJarvis/OpenHA

Problem

Research questions and friction points this paper is trying to address.

Unified agentic model mastering heterogeneous action spaces

Autonomous selection of optimal interaction granularity per step

Adaptive action switching without human-specified rules

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified model masters heterogeneous action spaces autonomously

Training pipeline integrates supervised fine-tuning with GRPO algorithm

Enables adaptive action switching balancing efficiency and precision

🔎 Similar Papers

No similar papers found.