Identifying and Understanding Cross-Class Features in Adversarial Training

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates the role of cross-class features in robustness evolution during adversarial training (AT). Addressing the lack of a unified explanation for the soft-label advantage and robust overfitting in AT, we introduce the novel concept of “cross-class features,” revealing that models initially prioritize learning shared features across classes to enhance robustness, but later shift toward class-specific features as overfitting emerges. Leveraging class-level feature attribution, synthetic data modeling, multi-architecture AT experiments, and dynamic tracking of robust loss, we establish an integrated theoretical–empirical framework. Our analysis identifies characteristic evolutionary patterns at the robustness inflection point and the overfitting transition, empirically validating the critical role of cross-class features in robust classification. To promote reproducibility, we release all code publicly.

Technology Category

Application Category

📝 Abstract

Adversarial training (AT) has been considered one of the most effective methods for making deep neural networks robust against adversarial attacks, while the training mechanisms and dynamics of AT remain open research problems. In this paper, we present a novel perspective on studying AT through the lens of class-wise feature attribution. Specifically, we identify the impact of a key family of features on AT that are shared by multiple classes, which we call cross-class features. These features are typically useful for robust classification, which we offer theoretical evidence to illustrate through a synthetic data model. Through systematic studies across multiple model architectures and settings, we find that during the initial stage of AT, the model tends to learn more cross-class features until the best robustness checkpoint. As AT further squeezes the training robust loss and causes robust overfitting, the model tends to make decisions based on more class-specific features. Based on these discoveries, we further provide a unified view of two existing properties of AT, including the advantage of soft-label training and robust overfitting. Overall, these insights refine the current understanding of AT mechanisms and provide new perspectives on studying them. Our code is available at https://github.com/PKU-ML/Cross-Class-Features-AT.

Problem

Research questions and friction points this paper is trying to address.

Studying adversarial training via class-wise feature attribution

Identifying cross-class features' role in robust classification

Unifying properties of AT like soft-label training benefits

Innovation

Methods, ideas, or system contributions that make the work stand out.

Class-wise feature attribution analysis

Identifies cross-class features impact

Unifies soft-label and overfitting properties

🔎 Similar Papers

No similar papers found.

Authors to Follow