TreeTeaming: Autonomous Red-Teaming of Vision-Language Models via Hierarchical Strategy Exploration

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing red-teaming approaches for vision-language models (VLMs) rely on static or linear strategies, limiting their ability to uncover diverse and subtle safety vulnerabilities. This work proposes TreeTeaming, a novel framework that formulates red-team strategy exploration as a dynamically evolving hierarchical policy tree. A large language model–driven orchestrator automatically constructs and expands adversarial attack paths within this tree structure. The approach substantially enhances attack diversity, stealth, and success rates: it achieves state-of-the-art performance on 11 out of 12 mainstream VLMs, attaining an 87.60% success rate on GPT-4o while simultaneously reducing average toxicity by 23.09%.

Technology Category

Application Category

📝 Abstract

The rapid advancement of Vision-Language Models (VLMs) has brought their safety vulnerabilities into sharp focus. However, existing red teaming methods are fundamentally constrained by an inherent linear exploration paradigm, confining them to optimizing within a predefined strategy set and preventing the discovery of novel, diverse exploits. To transcend this limitation, we introduce TreeTeaming, an automated red teaming framework that reframes strategy exploration from static testing to a dynamic, evolutionary discovery process. At its core lies a strategic Orchestrator, powered by a Large Language Model (LLM), which autonomously decides whether to evolve promising attack paths or explore diverse strategic branches, thereby dynamically constructing and expanding a strategy tree. A multimodal actuator is then tasked with executing these complex strategies. In the experiments across 12 prominent VLMs, TreeTeaming achieves state-of-the-art attack success rates on 11 models, outperforming existing methods and reaching up to 87.60\% on GPT-4o. The framework also demonstrates superior strategic diversity over the union of previously public jailbreak strategies. Furthermore, the generated attacks exhibit an average toxicity reduction of 23.09\%, showcasing their stealth and subtlety. Our work introduces a new paradigm for automated vulnerability discovery, underscoring the necessity of proactive exploration beyond static heuristics to secure frontier AI models.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models

red teaming

safety vulnerabilities

strategy exploration

automated attack

Innovation

Methods, ideas, or system contributions that make the work stand out.

TreeTeaming

hierarchical strategy exploration

autonomous red-teaming