🤖 AI Summary
This work addresses the non-convergence and instability of existing sign-based adversarial attack optimizers—such as I-FGSM and MI-FGSM—which limit their transferability. From an optimization perspective, the authors reformulate sign-based optimizers as coordinate-wise gradient descent and introduce a Monotonically Decreasing Coordinate-wise Step-size (MDCS) mechanism. They provide the first theoretical guarantee of an optimal O(1/√T) convergence rate for MDCS-MI. Experimental results demonstrate that the proposed method significantly improves attack success rate, stability, and cross-model transferability on both image classification and cross-modal retrieval tasks.
📝 Abstract
Crafting adversarial examples can be formulated as an optimization problem. While sign-based optimizers such as I-FGSM and MI-FGSM have become the de facto standard for the induced optimization problems, there still exist several unsolved problems in theoretical grounding and practical reliability especially in non-convergence and instability, which inevitably influences their transferability. Contrary to the expectation, we observe that the attack success rate may degrade sharply when more number of iterations are conducted. In this paper, we address these issues from an optimization perspective. By reformulating the sign-based optimizer as a specific coordinate-wise gradient descent, we argue that one cause for non-convergence and instability is their non-decaying step-size scheduling. Based upon this viewpoint, we propose a series of new attack algorithms that enforce Monotonically Decreasing Coordinate-wise Step-sizes (MDCS) within sign-based optimizers. Typically, we further provide theoretical guarantees proving that MDCS-MI attains an optimal convergence rate of $O(1/\sqrt{T})$, where $T$ is the number of iterations. Extensive experiments on image classification and cross-modal retrieval tasks demonstrate that our approach not only significantly improves transferability but also enhances attack stability compared to state-of-the-art sign-based methods.