MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address the rigidity and poor adaptability of existing jailbreaking strategies against black-box large language models (LLMs), this paper proposes a Markov chain-based adaptive jailbreaking framework. It formalizes diverse obfuscation strategies as states in a stochastic transition process, dynamically maintaining a strategy pool and updating transition probabilities in real time based on attack feedback—enabling online optimization of strategy selection and fusion. Integrating static prompt engineering with dynamic feedback, the framework supports multi-round iterative attacks. Evaluated on mainstream black-box LLMs—including GPT-4o and Gemini-2.0-flash—it achieves over 90% success rates with fewer than 15 average queries per attack, significantly outperforming state-of-the-art black-box jailbreaking methods. The core contribution lies in the first formalization of jailbreaking strategy evolution as a Markov decision process, thereby enhancing generalizability, robustness, and query efficiency of adversarial attacks.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have exhibited remarkable capabilities but remain vulnerable to jailbreaking attacks, which can elicit harmful content from the models by manipulating the input prompts. Existing black-box jailbreaking techniques primarily rely on static prompts crafted with a single, non-adaptive strategy, or employ rigid combinations of several underperforming attack methods, which limits their adaptability and generalization. To address these limitations, we propose MAJIC, a Markovian adaptive jailbreaking framework that attacks black-box LLMs by iteratively combining diverse innovative disguise strategies. MAJIC first establishes a ``Disguise Strategy Pool'' by refining existing strategies and introducing several innovative approaches. To further improve the attack performance and efficiency, MAJIC formulate the sequential selection and fusion of strategies in the pool as a Markov chain. Under this formulation, MAJIC initializes and employs a Markov matrix to guide the strategy composition, where transition probabilities between strategies are dynamically adapted based on attack outcomes, thereby enabling MAJIC to learn and discover effective attack pathways tailored to the target model. Our empirical results demonstrate that MAJIC significantly outperforms existing jailbreak methods on prominent models such as GPT-4o and Gemini-2.0-flash, achieving over 90% attack success rate with fewer than 15 queries per attempt on average.

Problem

Research questions and friction points this paper is trying to address.

Addresses vulnerability of LLMs to jailbreaking attacks

Overcomes limitations of static, non-adaptive jailbreaking strategies

Enhances attack success rate with dynamic strategy adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Markovian adaptive framework for jailbreaking

Iterative composition of diverse disguise strategies

Dynamic Markov matrix guides strategy selection

🔎 Similar Papers

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs