BiasBusters: Uncovering and Mitigating Tool Selection Bias in Large Language Models

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work identifies systematic biases in large language models (LLMs) when selecting functionally equivalent tools from multiple sources, undermining fairness and distorting market competition. To address this, we introduce the first benchmark encompassing diverse categories of semantically equivalent tools and conduct the first systematic quantification of how semantic alignment, description quality, and pretraining exposure frequency jointly drive selection bias. We propose a lightweight, training-free debiasing strategy comprising semantic-matching-based tool filtering and uniform sampling. Evaluated across seven mainstream LLMs, our approach reduces selection bias by an average of 42.6% while preserving over 98.3% task coverage. The method enhances fairness, robustness, and interpretability of tool invocation without requiring fine-tuning or additional model training.

Technology Category

Application Category

📝 Abstract

Agents backed by large language models (LLMs) often rely on external tools drawn from marketplaces where multiple providers offer functionally equivalent options. This raises a critical point concerning fairness: if selection is systematically biased, it can degrade user experience and distort competition by privileging some providers over others. We introduce a benchmark of diverse tool categories, each containing multiple functionally equivalent tools, to evaluate tool-selection bias. Using this benchmark, we test seven models and show that unfairness exists with models either fixating on a single provider or disproportionately preferring earlier-listed tools in context. To investigate the origins of this bias, we conduct controlled experiments examining tool features, metadata (name, description, parameters), and pre-training exposure. We find that: (1) semantic alignment between queries and metadata is the strongest predictor of choice; (2) perturbing descriptions significantly shifts selections; and (3) repeated pre-training exposure to a single endpoint amplifies bias. Finally, we propose a lightweight mitigation that first filters the candidate tools to a relevant subset and then samples uniformly, reducing bias while preserving good task coverage. Our findings highlight tool-selection bias as a key obstacle for the fair deployment of tool-augmented LLMs.

Problem

Research questions and friction points this paper is trying to address.

Detecting systematic bias in LLM tool selection from marketplaces

Investigating semantic and exposure factors causing unfair provider preference

Proposing lightweight mitigation for fair tool-augmented LLM deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark evaluates bias in tool selection

Controlled experiments identify semantic alignment causes

Lightweight filtering and uniform sampling mitigate bias

🔎 Similar Papers

LangBiTe: A Platform for Testing Bias in Large Language Models