Beyond the Tip of Efficiency: Uncovering the Submerged Threats of Jailbreak Attacks in Small Language Models

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical gap in security evaluation of small language models (SLMs), whose growing deployment on resource-constrained edge devices raises urgent concerns about robustness against jailbreaking attacks. Method: We conduct the first systematic empirical assessment of 13 state-of-the-art SLMs under diverse jailbreaking techniques—including prompt injection and token bypass—while benchmarking common safety mitigations (SFT, RLHF, guardrails) and performing attribution analysis of architectural compression, quantization (especially 4-bit), and knowledge distillation. Contribution/Results: We find that most SLMs exhibit significantly weaker defense capabilities than large language models, with several failing entirely against direct harmful prompts. Knowledge distillation and low-bit quantization substantially degrade safety, revealing an inherent tension between efficiency optimization and security. Our study establishes the first comprehensive security benchmark for SLMs, providing essential empirical foundations for trustworthy edge deployment.

Technology Category

Application Category

📝 Abstract
Small language models (SLMs) have become increasingly prominent in the deployment on edge devices due to their high efficiency and low computational cost. While researchers continue to advance the capabilities of SLMs through innovative training strategies and model compression techniques, the security risks of SLMs have received considerably less attention compared to large language models (LLMs).To fill this gap, we provide a comprehensive empirical study to evaluate the security performance of 13 state-of-the-art SLMs under various jailbreak attacks. Our experiments demonstrate that most SLMs are quite susceptible to existing jailbreak attacks, while some of them are even vulnerable to direct harmful prompts.To address the safety concerns, we evaluate several representative defense methods and demonstrate their effectiveness in enhancing the security of SLMs. We further analyze the potential security degradation caused by different SLM techniques including architecture compression, quantization, knowledge distillation, and so on. We expect that our research can highlight the security challenges of SLMs and provide valuable insights to future work in developing more robust and secure SLMs.
Problem

Research questions and friction points this paper is trying to address.

Assessing security risks in small language models
Evaluating jailbreak attack susceptibility in SLMs
Developing defenses for enhanced SLM security
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluate SLMs under jailbreak attacks
Assess defense methods for SLM security
Analyze security impact of SLM techniques