Tool Calling for Arabic LLMs: Data Strategies and Instruction Tuning

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Prior research on LLM tool use has been predominantly confined to English, leaving a critical gap in multilingual settings—particularly for low-resource languages like Arabic. Method: This work presents the first systematic investigation of tool-use capabilities in Arabic LLMs. To address data scarcity, we introduce the first open-source Arabic tool-use dataset and propose a three-stage optimization framework: (1) cross-lingual transfer initialization, (2) general instruction tuning to enhance zero-shot generalization, and (3) targeted fine-tuning on high-priority tools (e.g., calculator, date parser) to strengthen domain-specific proficiency. Contribution/Results: Experiments demonstrate that localized Arabic data boosts tool-call accuracy by +28.6%; general instruction tuning yields consistent gains; and targeted fine-tuning further delivers substantial performance improvements. Our dataset and methodology establish foundational resources for developing practical Arabic AI agents.

Technology Category

Application Category

📝 Abstract

Tool calling is a critical capability that allows Large Language Models (LLMs) to interact with external systems, significantly expanding their utility. However, research and resources for tool calling are predominantly English-centric, leaving a gap in our understanding of how to enable this functionality for other languages, such as Arabic. This paper investigates three key research questions: (1) the necessity of in-language (Arabic) tool-calling data versus relying on cross-lingual transfer, (2) the effect of general-purpose instruction tuning on tool-calling performance, and (3) the value of fine-tuning on specific, high-priority tools. To address these questions, we conduct extensive experiments using base and post-trained variants of an open-weight Arabic LLM. To enable this study, we bridge the resource gap by translating and adapting two open-source tool-calling datasets into Arabic. Our findings provide crucial insights into the optimal strategies for developing robust tool-augmented agents for Arabic.

Problem

Research questions and friction points this paper is trying to address.

Enabling tool calling capabilities for Arabic language models

Investigating data strategies for Arabic tool calling versus cross-lingual transfer

Evaluating instruction tuning effects on Arabic LLM tool performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Translated tool-calling datasets into Arabic

Investigated in-language data versus cross-lingual transfer

Fine-tuned Arabic LLM on specific high-priority tools

🔎 Similar Papers

No similar papers found.