🤖 AI Summary
This work addresses the high sensitivity of large language models to input tokenization perturbations—particularly adversarial ones—demonstrating that standard training leads to substantial performance degradation under non-standard tokenization (e.g., a 29.8% drop in accuracy for Llama-1B). To mitigate this vulnerability, the authors propose integrating a uniformly sampled random tokenization strategy during pretraining, supervised fine-tuning, and in-context learning. This approach is the first to systematically validate, across diverse model architectures and datasets, that such random tokenization significantly enhances robustness against both random and adversarial tokenization perturbations, without compromising original task performance or incurring additional inference overhead.
📝 Abstract
The widespread adoption of large language models (LLMs) has increased concerns about their robustness. Vulnerabilities in perturbations of tokenisation of the input indicate that models trained with a deterministic canonical tokenisation can be brittle to adversarial attacks. Recent studies suggest that stochastic tokenisation can deliver internal representations that are less sensitive to perturbations. In this paper, we analyse how stochastic tokenisations affect robustness to adversarial attacks and random perturbations. We systematically study this over a range of learning regimes (pre-training, supervised fine-tuning, and in-context learning), data sets, and model architectures. We show that pre-training and fine-tuning with uniformly sampled stochastic tokenisations improve robustness to random and adversarial perturbations. Evaluating on uniformly sampled non-canonical tokenisations reduces the accuracy of a canonically trained Llama-1b model by 29.8%. We find that training with stochastic tokenisation preserves accuracy without increasing inference cost.