Towards Class-wise Robustness Analysis

📅 2024-11-29

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Deep neural networks exhibit significant class-wise robustness disparities under domain shifts—such as data corruptions and adversarial attacks—undermining overall model security. Method: This paper systematically investigates the vulnerability mechanisms of individual classes in adversarially trained models and proposes Class-wise False Positive Score (CFPS), the first metric designed to fairly quantify class-specific susceptibility to attacks, moving beyond conventional aggregate robustness evaluation. We analyze class-level error patterns, probe latent-space structure, and conduct extensive benchmarking across diverse corruptions and adversarial attacks. Contribution/Results: Empirical results demonstrate a strong correlation between CFPS and actual class vulnerability, confirming that robustness is inherently non-uniform across classes. Our analysis uncovers implicit biases embedded in adversarial training, revealing their root causes. The proposed CFPS provides a novel, interpretable, and practical dimension for robust model design, security assessment, and explainability analysis—enabling fine-grained, class-aware evaluation of model resilience.

Technology Category

Application Category

📝 Abstract

While being very successful in solving many downstream tasks, the application of deep neural networks is limited in real-life scenarios because of their susceptibility to domain shifts such as common corruptions, and adversarial attacks. The existence of adversarial examples and data corruption significantly reduces the performance of deep classification models. Researchers have made strides in developing robust neural architectures to bolster decisions of deep classifiers. However, most of these works rely on effective adversarial training methods, and predominantly focus on overall model robustness, disregarding class-wise differences in robustness, which are critical. Exploiting weakly robust classes is a potential avenue for attackers to fool the image recognition models. Therefore, this study investigates class-to-class biases across adversarially trained robust classification models to understand their latent space structures and analyze their strong and weak class-wise properties. We further assess the robustness of classes against common corruptions and adversarial attacks, recognizing that class vulnerability extends beyond the number of correct classifications for a specific class. We find that the number of false positives of classes as specific target classes significantly impacts their vulnerability to attacks. Through our analysis on the Class False Positive Score, we assess a fair evaluation of how susceptible each class is to misclassification.

Problem

Research questions and friction points this paper is trying to address.

Analyzes class-wise robustness in adversarially trained models.

Investigates biases and vulnerabilities across different classes.

Assesses class susceptibility to misclassification and attacks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes class-wise robustness in neural networks

Focuses on class vulnerability to adversarial attacks

Introduces Class False Positive Score for evaluation

🔎 Similar Papers

No similar papers found.