🤖 AI Summary
This work addresses the lack of a unified evaluation framework for image steganography attacks and defenses, which hinders quantitative assessment of real-world security risks. The authors propose SADBench, the first comprehensive benchmark that systematically evaluates steganographic methods—based on invertible neural networks (INNs) and autoencoders—and diverse detectors across multiple dimensions: attack capability, defense performance, computational efficiency, and transferability. Evaluations span varying cover distributions, compression conditions, and image- or text-based payloads. Experiments reveal critical asymmetries: in-domain detection achieves near-perfect accuracy at lower cost than stego generation, attacks exhibit strong transferability while detectors generalize poorly, and malicious payloads persist or effectively adapt under social media compression. SADBench enables realistic simulation of social platform scenarios, establishing a standardized foundation for steganographic security research.
📝 Abstract
Image steganography is widely used to protect user privacy and enable covert communication. However, it can also be abused by the adversary as a covert channel to bypass content moderation, disseminate harmful semantics, and even hide malicious instructions in images to elicit dangerous outputs from large models, posing a practical security risk that continues to evolve. To address the lack of a unified and systematic evaluation framework, we propose SADBench, a systematic benchmark that assesses the adversary's ability to inject harmful secrets via steganography and the defender's ability to detect such threats through steganalysis. Crucially, SADBench comprises $4$ core tasks, namely steganography attack capability evaluation, steganalysis defense capability evaluation, efficiency evaluation, and transferability evaluation. It evaluates both image-payload and text-payload steganography across diverse cover distributions, utilizing harmful visual semantics and toxic instructions to simulate malicious attacks. Across a broad set of attacks and detectors, SADBench reveals that (i) INN and autoencoder-based methods demonstrate superior stability compared to other architectures, (ii) in-domain detection is near-perfect and cheaper than generation, (iii) a critical asymmetry exists in transferability where attacks robustly generalize to new distributions while detectors fail to adapt, and (iv) real-world threats persist on social media, where payloads either survive minimal compression or effectively adapt to aggressive compression via simulated training. Overall, SADBench establishes a systematic, reproducible, and extensible framework to quantify risks, paving the way for measurable and security-driven advancements in steganography defense.