PICO: Secure Transformers via Robust Prompt Isolation and Cybersecurity Oversight

📅 2025-04-26

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This paper addresses the reliability degradation of large language model (LLM) responses caused by prompt injection attacks. Methodologically, it proposes an injection-resistant dual-channel Transformer architecture: (1) a structured prompt isolation paradigm that separates trusted system instructions and untrusted user inputs into distinct channels; (2) a gated fusion mechanism coupled with a provably invariant system-instruction branch to preserve instruction integrity; and (3) integration of a Mixture-of-Experts (MoE) security expert module with cybersecurity knowledge graph (CKG)-guided dynamic reasoning for domain-aware defense. Experiments demonstrate a 99.2% defense success rate against representative attacks such as Policy Puppetry, zero-shot cross-domain transfer capability, and a complete deployment pipeline—from pretraining to efficient fine-tuning—ensuring practical applicability and robustness.

Technology Category

Application Category

📝 Abstract

We propose a robust transformer architecture designed to prevent prompt injection attacks and ensure secure, reliable response generation. Our PICO (Prompt Isolation and Cybersecurity Oversight) framework structurally separates trusted system instructions from untrusted user inputs through dual channels that are processed independently and merged only by a controlled, gated fusion mechanism. In addition, we integrate a specialized Security Expert Agent within a Mixture-of-Experts (MoE) framework and incorporate a Cybersecurity Knowledge Graph (CKG) to supply domain-specific reasoning. Our training design further ensures that the system prompt branch remains immutable while the rest of the network learns to handle adversarial inputs safely. This PICO framework is presented via a general mathematical formulation, then elaborated in terms of the specifics of transformer architecture, and fleshed out via hypothetical case studies including Policy Puppetry attacks. While the most effective implementation may involve training transformers in a PICO-based way from scratch, we also present a cost-effective fine-tuning approach.

Problem

Research questions and friction points this paper is trying to address.

Prevent prompt injection attacks in transformers

Ensure secure response generation via dual-channel processing

Integrate cybersecurity oversight with immutable system prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-channel architecture isolates trusted and untrusted inputs

Security Expert Agent and Cybersecurity Knowledge Graph integration

Immutable system prompt branch with adversarial input handling

🔎 Similar Papers

Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits