Blue Teaming Function-Calling Agents

📅 2026-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical security gaps in current open-source large language models (LLMs) when deployed for function calling, where default configurations lack built-in safety mechanisms and remain vulnerable to adversarial attacks. The authors present the first systematic red-teaming evaluation, assessing the robustness of four prominent open-source LLMs under three representative attack vectors and comprehensively evaluating the efficacy of eight existing defense strategies. Leveraging an automated evaluation framework that integrates adversarial attack simulation with function-calling interface testing, the research reveals that all examined models exhibit significant security vulnerabilities in their default settings. Moreover, none of the evaluated defenses adequately meet the safety requirements for real-world deployment, thereby exposing a fundamental security shortfall in LLM-based function-calling agents.

Technology Category

Application Category

📝 Abstract
We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the defences are not yet employable in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

function-calling agents
robustness
LLM security
adversarial attacks
defence mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

function-calling agents
robustness evaluation
adversarial attacks
defense mechanisms
large language models
G
Greta Dolcetti
Ca’ Foscari University of Venice, Venice, Italy
Giulio Zizzo
Giulio Zizzo
Research Scientist, IBM Research
Machine LearningSecurityAdversarial MLFederated Learning
S
S. Maffeis
Imperial College London, London, UK