PermaFrost-Attack: Stealth Pretraining Seeding(SPS) for planting Logic Landmines During LLM Training

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work demonstrates that large language models are vulnerable during pretraining to stealthy data poisoning attacks that implant latent logical backdoors capable of bypassing alignment safeguards. The authors propose Stealth Pretraining Seeding (SPS), a novel attack framework that embeds imperceptible malicious content into seemingly benign web pages, causing crawlers to ingest it into pretraining corpora and thereby planting “logic mines” activatable by specific triggers. The study is the first to systematically establish the feasibility of covert poisoning at the pretraining stage, introducing a “dormant–activation” paradigm analogous to permafrost and employing geometric diagnostic tools—such as thermodynamic length, spectral curvature, and infection traceback graphs—to model and detect these vulnerabilities. Experiments across multiple model families and scales confirm SPS’s effectiveness in inducing persistent unsafe behaviors while evading state-of-the-art alignment defenses, highlighting its tangible threat to foundation models.

Technology Category

Application Category

📝 Abstract

Aligned large language models(LLMs) remain vulnerable to adversarial manipulation, and their dependence on web-scale pretraining creates a subtle but serious attack surface. We study Stealth Pretraining Seeding (SPS), a new attack family in which adversaries distribute small amounts of poisoned content across stealth websites, expose them to web crawlers through robots.txt, and thereby increase the likelihood that such content is absorbed into future training corpora derived from sources such as Common Crawl. Because each individual payload is tiny, diffuse, and superficially benign, the attack is difficult to detect during dataset construction or filtering. The result is a latent form of poisoning: dormant logic landmines embedded during pretraining that remain largely invisible under standard evaluation, yet can later be activated by precise alphanumeric triggers such as <00TRIGGER00> to bypass safeguards. We call this attack PermaFrost, by analogy to Arctic permafrost: harmful material can remain frozen, buried, and unnoticed for long periods, only to resurface when conditions allow. We operationalize this threat through PermaFrost-Attack, a controlled framework for latent conceptual poisoning, together with a suite of geometric diagnostics: Thermodynamic Length, Spectral Curvature, and the Infection Traceback Graph. Across multiple model families and scales, we show that SPS is broadly effective, inducing persistent unsafe behavior while often evading alignment defenses. Our results identify SPS as a practical and underappreciated threat to future foundation models. This paper introduces a novel geometric diagnostic lens for systematically examining latent model behavior, providing a principled foundation for detecting, characterizing, and understanding vulnerabilities that may remain invisible to standard evaluation.

Problem

Research questions and friction points this paper is trying to address.

Stealth Pretraining Seeding

Logic Landmines

Adversarial Manipulation

Latent Poisoning

Foundation Model Security

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stealth Pretraining Seeding

Logic Landmines

Latent Poisoning