TrojanDec: Data-free Detection of Trojan Inputs in Self-supervised Learning

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Self-supervised image encoders are vulnerable to backdoor attacks, yet existing defenses typically require access to training data or downstream models—resources often unavailable in real-world deployment. Method: This paper proposes the first input-level, training-data-free framework for detecting and recovering from backdoor triggers. It enables zero-shot trigger identification via feature distribution shift analysis, followed by latent-space reconstruction optimization guided by adversarial robustness constraints to achieve reversible, high-fidelity trigger removal. Contribution/Results: The approach breaks reliance on training data or task-specific models, enabling data-agnostic backdoor input detection and secure recovery. Evaluated against multiple state-of-the-art backdoor attacks, it achieves >94% detection accuracy, preserves high input reconstruction quality, and incurs negligible degradation (<0.5%) in downstream classification accuracy—substantially outperforming prior defenses.

Technology Category

Application Category

📝 Abstract

An image encoder pre-trained by self-supervised learning can be used as a general-purpose feature extractor to build downstream classifiers for various downstream tasks. However, many studies showed that an attacker can embed a trojan into an encoder such that multiple downstream classifiers built based on the trojaned encoder simultaneously inherit the trojan behavior. In this work, we propose TrojanDec, the first data-free method to identify and recover a test input embedded with a trigger. Given a (trojaned or clean) encoder and a test input, TrojanDec first predicts whether the test input is trojaned. If not, the test input is processed in a normal way to maintain the utility. Otherwise, the test input will be further restored to remove the trigger. Our extensive evaluation shows that TrojanDec can effectively identify the trojan (if any) from a given test input and recover it under state-of-the-art trojan attacks. We further demonstrate by experiments that our TrojanDec outperforms the state-of-the-art defenses.

Problem

Research questions and friction points this paper is trying to address.

Trojaned Pretrained Image Encoders

Detection

Repair

Innovation

Methods, ideas, or system contributions that make the work stand out.

TrojanDec

self-learning mode

Trojan horse attack detection and repair

🔎 Similar Papers

Packet Inspection Transformer: A Self-Supervised Journey to Unseen Malware Detection with Few Samples