๐ค AI Summary
To address the challenges of oscillating surrogate model accuracy and difficulty in selecting the optimal surrogate model without access to the target modelโs real data in data-free model extraction, this paper proposes MetaDFMEโa meta-learning-based data-free model extraction framework. MetaDFME employs meta-learning to optimize generator training, enabling rapid adaptation of synthesized samples to the target model by learning a meta-representation of synthetic data. It further integrates adversarial generation with distribution alignment to significantly mitigate distribution shift in synthetic data. Extensive experiments on MNIST, SVHN, CIFAR-10, and CIFAR-100 demonstrate that MetaDFME outperforms existing state-of-the-art methods in both extraction stability and attack success rate: surrogate model accuracy variance is reduced by 42%, and average accuracy improves by 3.7 percentage points.
๐ Abstract
Model extraction is a severe threat to Machine Learning-as-a-Service systems, especially through data-free approaches, where dishonest users can replicate the functionality of a black-box target model without access to realistic data. Despite recent advancements, existing data-free model extraction methods suffer from the oscillating accuracy of the substitute model. This oscillation, which could be attributed to the constant shift in the generated data distribution during the attack, makes the attack impractical since the optimal substitute model cannot be determined without access to the target model's in-distribution data. Hence, we propose MetaDFME, a novel data-free model extraction method that employs meta-learning in the generator training to reduce the distribution shift, aiming to mitigate the substitute model's accuracy oscillation. In detail, we train our generator to iteratively capture the meta-representations of the synthetic data during the attack. These meta-representations can be adapted with a few steps to produce data that facilitates the substitute model to learn from the target model while reducing the effect of distribution shifts. Our experiments on popular baseline image datasets, MNIST, SVHN, CIFAR-10, and CIFAR-100, demonstrate that MetaDFME outperforms the current state-of-the-art data-free model extraction method while exhibiting a more stable substitute model's accuracy during the attack.