Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent

📅 2025-10-22

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work uncovers a fundamental discrepancy between human readability and model perception in stylized fonts—a previously unrecognized adversarial attack surface in NLP. To exploit this, we propose SAD (Stylized-Adversarial Deception), the first font-style-based adversarial attack framework. SAD operates via character-level stylistic substitution and token-mapping discrepancy analysis, enabling effective, stealthy, and query-efficient attacks against traditional models, large language models (LLMs), and commercial APIs—while preserving semantic integrity and human readability. It supports two operational modes: lightweight and strong-attack. Empirical evaluation across sentiment analysis, machine translation, and multimodal generation demonstrates high attack success rates and substantial degradation in system performance. Crucially, SAD is the first systematic investigation to expose severe font-robustness vulnerabilities in state-of-the-art NLP and multimodal models.

Technology Category

Application Category

📝 Abstract

With social media growth, users employ stylistic fonts and font-like emoji to express individuality, creating visually appealing text that remains human-readable. However, these fonts introduce hidden vulnerabilities in NLP models: while humans easily read stylistic text, models process these characters as distinct tokens, causing interference. We identify this human-model perception gap and propose a style-based attack, Style Attack Disguise (SAD). We design two sizes: light for query efficiency and strong for superior attack performance. Experiments on sentiment classification and machine translation across traditional models, LLMs, and commercial services demonstrate SAD's strong attack performance. We also show SAD's potential threats to multimodal tasks including text-to-image and text-to-speech generation.

Problem

Research questions and friction points this paper is trying to address.

Exploits font-model perception gap for adversarial attacks

Targets NLP systems via stylized text vulnerabilities

Threatens multimodal applications like text-to-image generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Style-based attack exploits font-model perception gap

Light and strong attack designs balance efficiency and performance

Multimodal threat demonstrated across text-image-speech tasks

🔎 Similar Papers

Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models