Blue Teaming Function-Calling Agents

📅 2026-01-14

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This study addresses the critical security gaps in current open-source large language models (LLMs) when deployed for function calling, where default configurations lack built-in safety mechanisms and remain vulnerable to adversarial attacks. The authors present the first systematic red-teaming evaluation, assessing the robustness of four prominent open-source LLMs under three representative attack vectors and comprehensively evaluating the efficacy of eight existing defense strategies. Leveraging an automated evaluation framework that integrates adversarial attack simulation with function-calling interface testing, the research reveals that all examined models exhibit significant security vulnerabilities in their default settings. Moreover, none of the evaluated defenses adequately meet the safety requirements for real-world deployment, thereby exposing a fundamental security shortfall in LLM-based function-calling agents.

Technology Category

Application Category

📝 Abstract

We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the defences are not yet employable in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

function-calling agents

robustness

LLM security

adversarial attacks

defence mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

function-calling agents

robustness evaluation

adversarial attacks