Indiana Jones: There Are Always Some Useful Ancient Relics

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) exhibit systemic safety vulnerabilities when processing inputs containing historical or background narratives, as such context can inadvertently trigger harmful content generation. To address this, we propose a three-model collaborative jailbreaking framework grounded in historical contextualization: it integrates role-assigned multi-LLM dialogue, semantic masking for keyword injection, context-aware prompt engineering, and cross-model feedback reinforcement—enabling effective jailbreaking of both white-box and black-box models. This work is the first to incorporate explicit historical context modeling and inter-model collaboration into the jailbreaking paradigm. Evaluated on mainstream LLMs, our method achieves near-perfect (≈100%) jailbreaking success rates, substantially outperforming state-of-the-art approaches. Beyond empirical advancement, our findings expose LLMs’ deep susceptibility to ostensibly benign yet implicitly manipulative instructions embedded within narrative contexts. The study further provides novel methodological insights and an empirical benchmark for ethical auditing and robustness enhancement of content safety mechanisms.

Technology Category

Application Category

📝 Abstract
This paper introduces Indiana Jones, an innovative approach to jailbreaking Large Language Models (LLMs) by leveraging inter-model dialogues and keyword-driven prompts. Through orchestrating interactions among three specialised LLMs, the method achieves near-perfect success rates in bypassing content safeguards in both white-box and black-box LLMs. The research exposes systemic vulnerabilities within contemporary models, particularly their susceptibility to producing harmful or unethical outputs when guided by ostensibly innocuous prompts framed in historical or contextual contexts. Experimental evaluations highlight the efficacy and adaptability of Indiana Jones, demonstrating its superiority over existing jailbreak methods. These findings emphasise the urgent need for enhanced ethical safeguards and robust security measures in the development of LLMs. Moreover, this work provides a critical foundation for future studies aimed at fortifying LLMs against adversarial exploitation while preserving their utility and flexibility.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Ethical Issues
Safety Concerns
Innovation

Methods, ideas, or system contributions that make the work stand out.

Indiana Jones method
ethical safety improvements
versatile large language models
🔎 Similar Papers
No similar papers found.