ASIDE: Architectural Separation of Instructions and Data in Language Models

📅 2025-03-13

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Large language models (LLMs) are vulnerable to prompt injection attacks because instructions and input data are not explicitly separated in the embedding space. To address this, we propose the first architecture-level dual-path embedding mechanism that strictly decouples instruction and data representations without requiring pretraining or safety fine-tuning. Our approach orthogonally rotates and reuses the original embedding layer to construct two parallel embedding streams—one for instructions and one for data—followed by orthogonal transformation and representation-space analysis. Experiments demonstrate a significant increase in instruction-data separation while preserving zero degradation in primary task performance. On standard prompt injection benchmarks, our method achieves state-of-the-art defense efficacy, confirming that architectural-level separation provides fundamental robustness gains against adversarial prompting.

Technology Category

Application Category

📝 Abstract

Despite their remarkable performance, large language models lack elementary safety features, and this makes them susceptible to numerous malicious attacks. In particular, previous work has identified the absence of an intrinsic separation between instructions and data as a root cause for the success of prompt injection attacks. In this work, we propose an architectural change, ASIDE, that allows the model to clearly separate between instructions and data by using separate embeddings for them. Instead of training the embeddings from scratch, we propose a method to convert an existing model to ASIDE form by using two copies of the original model's embeddings layer, and applying an orthogonal rotation to one of them. We demonstrate the effectiveness of our method by showing (1) highly increased instruction-data separation scores without a loss in model capabilities and (2) competitive results on prompt injection benchmarks, even without dedicated safety training. Additionally, we study the working mechanism behind our method through an analysis of model representations.

Problem

Research questions and friction points this paper is trying to address.

Addresses lack of safety in large language models

Proposes ASIDE to separate instructions and data

Enhances resistance to prompt injection attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Separates instructions and data using distinct embeddings

Converts existing models to ASIDE via orthogonal rotation

Enhances safety without dedicated training or capability loss

🔎 Similar Papers

Better Language Models Exhibit Higher Visual Alignment