ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) lack context-aware, personalized modeling in tool invocation, leading to selection bias in overlapping-tool scenarios and diminished user satisfaction. To address this, we introduce ToolSpectrum—the first benchmark for personalized tool calling—formally defining a dual-dimensional personalization paradigm grounded in “user profiling” and “environmental factors.” ToolSpectrum comprises multi-turn, real-world tasks, integrating controlled-variable experiments, human annotation, and automated evaluation to enable fine-grained attribution analysis. Experimental results demonstrate that personalized tool invocation significantly improves user experience; however, state-of-the-art LLMs achieve only 61.8% average accuracy on dual-dimensional joint reasoning, revealing a critical capability gap. This work establishes foundational benchmarks, theoretical framing, and evaluation methodologies for advancing personalized tool utilization in LLMs.

Technology Category

Application Category

📝 Abstract
While integrating external tools into large language models (LLMs) enhances their ability to access real-time information and domain-specific services, existing approaches focus narrowly on functional tool selection following user instructions, overlooking the context-aware personalization in tool selection. This oversight leads to suboptimal user satisfaction and inefficient tool utilization, particularly when overlapping toolsets require nuanced selection based on contextual factors. To bridge this gap, we introduce ToolSpectrum, a benchmark designed to evaluate LLMs' capabilities in personalized tool utilization. Specifically, we formalize two key dimensions of personalization, user profile and environmental factors, and analyze their individual and synergistic impacts on tool utilization. Through extensive experiments on ToolSpectrum, we demonstrate that personalized tool utilization significantly improves user experience across diverse scenarios. However, even state-of-the-art LLMs exhibit the limited ability to reason jointly about user profiles and environmental factors, often prioritizing one dimension at the expense of the other. Our findings underscore the necessity of context-aware personalization in tool-augmented LLMs and reveal critical limitations for current models. Our data and code are available at https://github.com/Chengziha0/ToolSpectrum.
Problem

Research questions and friction points this paper is trying to address.

Enhancing personalized tool selection for LLMs beyond functional instructions
Addressing suboptimal tool use due to ignored contextual and user factors
Evaluating LLMs' ability to jointly reason about user and environment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces ToolSpectrum for personalized tool utilization
Formalizes user profile and environmental factors
Demonstrates improved user experience via personalization
🔎 Similar Papers
No similar papers found.
Z
Zihao Cheng
School of Computer Science and Engineering, Beihang University, Beijing, China; University of Science and Technology Beijing, Beijing, China
H
Hongru Wang
The Chinese University of Hong Kong, Hong Kong, China
Z
Zeming Liu
School of Computer Science and Engineering, Beihang University, Beijing, China
Y
Yuhang Guo
School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
Yuanfang Guo
Yuanfang Guo
Beihang University
Multimedia securityAI securityGraph Neural NetworksMultimedia processing
Yunhong Wang
Yunhong Wang
Professor, School of Computer Science and Engineering, Beihang University
BiometricsPattern RecognitionImage ProcessingComputer Vision
H
Haifeng Wang
Baidu Inc., Beijing, China