STAR-Teaming: A Strategy-Response Multiplex Network Approach to Automated LLM Red Teaming

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
🤖 AI Summary
This work addresses the vulnerability of large language models to jailbreaking attacks that elicit harmful outputs. To counter this, the authors propose the STAR-Teaming framework, which introduces for the first time a strategy-response multiplex network to model red-teaming dynamics. By integrating multi-agent systems with network-driven optimization, the framework reconstructs interpretable semantic community structures in high-dimensional embedding spaces to guide efficient adversarial sampling. This approach substantially improves attack success rates while reducing computational overhead, achieving both high efficiency and strong interpretability.

Technology Category

Application Category

📝 Abstract
While Large Language Models (LLMs) are widely used, they remain susceptible to jailbreak prompts that can elicit harmful or inappropriate responses. This paper introduces STAR-Teaming, a novel black-box framework for automated red teaming that effectively generates such prompts. STAR-Teaming integrates a Multi-Agent System (MAS) with a Strategy-Response Multiplex Network and employs network-driven optimization to sample effective attack strategies. This network-based approach recasts the intractable high-dimensional embedding space into a tractable structure, yielding two key advantages: it enhances the interpretability of the LLM's strategic vulnerabilities, and it streamlines the search for effective strategies by organizing the search space into semantic communities, thereby preventing redundant exploration. Empirical results demonstrate that STAR-Teaming significantly surpasses existing methods, achieving a higher attack success rate (ASR) at a lower computational cost. Extensive experiments validate the effectiveness and explainability of the Multiplex Network. The code is available at https://github.com/selectstar-ai/STAR-Teaming-paper.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
jailbreak prompts
red teaming
adversarial attacks
model safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiplex Network
Red Teaming
Multi-Agent System
Strategy Optimization
Black-box Attack
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
M
MinJae Jung
DATUMO INC.
YongTaek Lim
YongTaek Lim
University of Seoul
AIAI FairnessRecommender System
Chaeyun Kim
Chaeyun Kim
Seoul National University
MultimodalVisual GroundingRepresentation Learning
J
Junghwan Kim
DATUMO INC.
K
Kihyun Kim
DATUMO INC.
M
Minwoo Kim
DATUMO INC.