ToolSpectrum : Towards Personalized Tool Utilization for Large Language Models

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Current large language models (LLMs) lack context-aware, personalized modeling in tool invocation, leading to selection bias in overlapping-tool scenarios and diminished user satisfaction. To address this, we introduce ToolSpectrum—the first benchmark for personalized tool calling—formally defining a dual-dimensional personalization paradigm grounded in “user profiling” and “environmental factors.” ToolSpectrum comprises multi-turn, real-world tasks, integrating controlled-variable experiments, human annotation, and automated evaluation to enable fine-grained attribution analysis. Experimental results demonstrate that personalized tool invocation significantly improves user experience; however, state-of-the-art LLMs achieve only 61.8% average accuracy on dual-dimensional joint reasoning, revealing a critical capability gap. This work establishes foundational benchmarks, theoretical framing, and evaluation methodologies for advancing personalized tool utilization in LLMs.

Technology Category

Application Category

📝 Abstract

While integrating external tools into large language models (LLMs) enhances their ability to access real-time information and domain-specific services, existing approaches focus narrowly on functional tool selection following user instructions, overlooking the context-aware personalization in tool selection. This oversight leads to suboptimal user satisfaction and inefficient tool utilization, particularly when overlapping toolsets require nuanced selection based on contextual factors. To bridge this gap, we introduce ToolSpectrum, a benchmark designed to evaluate LLMs' capabilities in personalized tool utilization. Specifically, we formalize two key dimensions of personalization, user profile and environmental factors, and analyze their individual and synergistic impacts on tool utilization. Through extensive experiments on ToolSpectrum, we demonstrate that personalized tool utilization significantly improves user experience across diverse scenarios. However, even state-of-the-art LLMs exhibit the limited ability to reason jointly about user profiles and environmental factors, often prioritizing one dimension at the expense of the other. Our findings underscore the necessity of context-aware personalization in tool-augmented LLMs and reveal critical limitations for current models. Our data and code are available at https://github.com/Chengziha0/ToolSpectrum.

Problem

Research questions and friction points this paper is trying to address.

Enhancing personalized tool selection for LLMs beyond functional instructions

Addressing suboptimal tool use due to ignored contextual and user factors

Evaluating LLMs' ability to jointly reason about user and environment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces ToolSpectrum for personalized tool utilization

Formalizes user profile and environmental factors

Demonstrates improved user experience via personalization

🔎 Similar Papers

No similar papers found.