BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

Current safety-aligned large language models (LLMs) remain vulnerable to adversarial attacks that bypass alignment safeguards and elicit harmful outputs. To address this, we propose BitBypass—a novel black-box jailbreaking attack that shifts the jailbreaking paradigm from prompt engineering to the bitstream representation layer. BitBypass achieves stealthy, physical-layer obfuscation via hyphen-encoding perturbations on input text, enabling gradient-free construction of adversarial inputs. Crucially, it requires no access to model internals or output feedback, operating solely through bit-level manipulations of the input encoding. Extensive evaluations across GPT-4o, Gemini 1.5, Claude 3.5, Llama 3.1, and Mixtral demonstrate that BitBypass significantly outperforms state-of-the-art methods in both jailbreaking success rate and stealthiness. Our results expose a critical security vulnerability at the data representation level of aligned LLMs, underscoring the need for robust alignment mechanisms resilient to low-level encoding perturbations.

Technology Category

Application Category

📝 Abstract

The inherent risk of generating harmful and unsafe content by Large Language Models (LLMs), has highlighted the need for their safety alignment. Various techniques like supervised fine-tuning, reinforcement learning from human feedback, and red-teaming were developed for ensuring the safety alignment of LLMs. However, the robustness of these aligned LLMs is always challenged by adversarial attacks that exploit unexplored and underlying vulnerabilities of the safety alignment. In this paper, we develop a novel black-box jailbreak attack, called BitBypass, that leverages hyphen-separated bitstream camouflage for jailbreaking aligned LLMs. This represents a new direction in jailbreaking by exploiting fundamental information representation of data as continuous bits, rather than leveraging prompt engineering or adversarial manipulations. Our evaluation of five state-of-the-art LLMs, namely GPT-4o, Gemini 1.5, Claude 3.5, Llama 3.1, and Mixtral, in adversarial perspective, revealed the capabilities of BitBypass in bypassing their safety alignment and tricking them into generating harmful and unsafe content. Further, we observed that BitBypass outperforms several state-of-the-art jailbreak attacks in terms of stealthiness and attack success. Overall, these results highlights the effectiveness and efficiency of BitBypass in jailbreaking these state-of-the-art LLMs.

Problem

Research questions and friction points this paper is trying to address.

Exploiting vulnerabilities in safety-aligned LLMs

Bypassing safety measures with bitstream camouflage

Generating harmful content despite alignment techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

BitBypass uses bitstream camouflage for jailbreaking

Exploits data representation as continuous bits

Outperforms other attacks in stealth and success

🔎 Similar Papers

Lockpicking LLMs: A Logit-Based Jailbreak Using Token-level Manipulation