🤖 AI Summary
Encoder stealing attacks suffer severe performance degradation under perturbation-based defenses. To address this, we propose BESA, an enhanced attack framework incorporating perturbation recovery. BESA introduces a novel dual-module co-design: a lightweight perturbation detector leveraging feature-space analysis and classification, and a conditional generative model for end-to-end perturbation removal. These modules are jointly optimized and modularly integrable into existing stealing methods. For the first time, BESA theoretically and empirically breaks multi-layer composite perturbation defenses—achieving up to a 24.63% improvement in surrogate encoder accuracy across multiple benchmarks, outperforming state-of-the-art attacks. Moreover, it exhibits strong robustness against both single-layer and composite perturbation defenses.
📝 Abstract
To boost the encoder stealing attack under the perturbation-based defense that hinders the attack performance, we propose a boosting encoder stealing attack with perturbation recovery named BESA. It aims to overcome perturbation-based defenses. The core of BESA consists of two modules: perturbation detection and perturbation recovery, which can be combined with canonical encoder stealing attacks. The perturbation detection module utilizes the feature vectors obtained from the target encoder to infer the defense mechanism employed by the service provider. Once the defense mechanism is detected, the perturbation recovery module leverages the well-designed generative model to restore a clean feature vector from the perturbed one. Through extensive evaluations based on various datasets, we demonstrate that BESA significantly enhances the surrogate encoder accuracy of existing encoder stealing attacks by up to 24.63% when facing state-of-the-art defenses and combinations of multiple defenses.