🤖 AI Summary
This work addresses the challenge of aligning user intent with the dynamically evolving semantics of vast tool sets in open-world scenarios, where existing methods suffer from limited generalization and suboptimal tool retrieval and execution performance. To overcome these limitations, we propose ToolOmni, a unified agent framework that endows large language models with open-world tool-use capabilities through a closed-loop reasoning process integrating active retrieval and embodied execution. We innovatively construct a cold-start multi-turn interaction dataset and introduce a decoupled multi-objective GRPO algorithm to jointly optimize retrieval accuracy and execution effectiveness in an online manner. Experimental results demonstrate that ToolOmni achieves state-of-the-art performance, surpassing strong baselines by 10.8% in end-to-end execution success rate.
📝 Abstract
Large Language Models (LLMs) enhance their problem-solving capability by utilizing external tools. However, in open-world scenarios with massive and evolving tool repositories, existing methods relying on static embedding retrieval or parameter memorization of tools struggle to align user intent with tool semantics or generalize to unseen tools, respectively, leading to suboptimal accuracy of open-world tool retrieval and execution. To address these, we present ToolOmni, a unified agentic framework that enables LLMs for open-world tool use by proactive retrieval and grounded execution within a reasoning loop. First, we construct a cold-start multi-turn interaction dataset to instill foundational agentic capabilities via Supervised Fine-Tuning (SFT). Then, we introduce open-world tool learning based on a Decoupled Multi-Objective GRPO algorithm, which simultaneously optimizes LLMs for both tool retrieval accuracy and execution efficacy in online environments. Extensive experiments demonstrate that ToolOmni achieves state-of-the-art performance both in retrieval and execution, surpassing strong baselines by a significant margin of +10.8% in end-to-end execution success rate, while exhibiting exceptional robustness and generalization capabilities.