🤖 AI Summary
This study addresses the critical security gaps in current open-source large language models (LLMs) when deployed for function calling, where default configurations lack built-in safety mechanisms and remain vulnerable to adversarial attacks. The authors present the first systematic red-teaming evaluation, assessing the robustness of four prominent open-source LLMs under three representative attack vectors and comprehensively evaluating the efficacy of eight existing defense strategies. Leveraging an automated evaluation framework that integrates adversarial attack simulation with function-calling interface testing, the research reveals that all examined models exhibit significant security vulnerabilities in their default settings. Moreover, none of the evaluated defenses adequately meet the safety requirements for real-world deployment, thereby exposing a fundamental security shortfall in LLM-based function-calling agents.
📝 Abstract
We present an experimental evaluation that assesses the robustness of four open source LLMs claiming function-calling capabilities against three different attacks, and we measure the effectiveness of eight different defences. Our results show how these models are not safe by default, and how the defences are not yet employable in real-world scenarios.