๐ค AI Summary
This work uncovers a fundamental discrepancy between human readability and model perception in stylized fontsโa previously unrecognized adversarial attack surface in NLP. To exploit this, we propose SAD (Stylized-Adversarial Deception), the first font-style-based adversarial attack framework. SAD operates via character-level stylistic substitution and token-mapping discrepancy analysis, enabling effective, stealthy, and query-efficient attacks against traditional models, large language models (LLMs), and commercial APIsโwhile preserving semantic integrity and human readability. It supports two operational modes: lightweight and strong-attack. Empirical evaluation across sentiment analysis, machine translation, and multimodal generation demonstrates high attack success rates and substantial degradation in system performance. Crucially, SAD is the first systematic investigation to expose severe font-robustness vulnerabilities in state-of-the-art NLP and multimodal models.
๐ Abstract
With social media growth, users employ stylistic fonts and font-like emoji to express individuality, creating visually appealing text that remains human-readable. However, these fonts introduce hidden vulnerabilities in NLP models: while humans easily read stylistic text, models process these characters as distinct tokens, causing interference. We identify this human-model perception gap and propose a style-based attack, Style Attack Disguise (SAD). We design two sizes: light for query efficiency and strong for superior attack performance. Experiments on sentiment classification and machine translation across traditional models, LLMs, and commercial services demonstrate SAD's strong attack performance. We also show SAD's potential threats to multimodal tasks including text-to-image and text-to-speech generation.