Disentangling Content from Style to Overcome Shortcut Learning: A Hybrid Generative-Discriminative Learning Framework

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Self-supervised learning (SSL) suffers from insufficient cross-domain generalization due to shortcut learning—e.g., overreliance on superficial cues such as texture. Existing approaches perform domain alignment or separation only at the feature level, failing to address the underlying mechanism of shortcut dependence. This paper proposes HyGDL, a hybrid generative-discriminative framework that achieves intrinsic content-style disentanglement. Leveraging invariance-based pretraining, HyGDL employs a single encoder and explicitly defines style-orthogonal representations via vector projection. During training, input-level style perturbations are coupled with invariant supervision signals to fundamentally suppress shortcut learning. Experiments demonstrate that HyGDL systematically mitigates shortcut bias, consistently outperforming state-of-the-art methods across multiple cross-domain transfer benchmarks. It improves both structural-aware accuracy and generalization robustness, validating the effectiveness of its principled disentanglement strategy.

Technology Category

Application Category

📝 Abstract
Despite the remarkable success of Self-Supervised Learning (SSL), its generalization is fundamentally hindered by Shortcut Learning, where models exploit superficial features like texture instead of intrinsic structure. We experimentally verify this flaw within the generative paradigm (e.g., MAE) and argue it is a systemic issue also affecting discriminative methods, identifying it as the root cause of their failure on unseen domains. While existing methods often tackle this at a surface level by aligning or separating domain-specific features, they fail to alter the underlying learning mechanism that fosters shortcut dependency. To address this at its core, we propose HyGDL (Hybrid Generative-Discriminative Learning Framework), a hybrid framework that achieves explicit content-style disentanglement. Our approach is guided by the Invariance Pre-training Principle: forcing a model to learn an invariant essence by systematically varying a bias (e.g., style) at the input while keeping the supervision signal constant. HyGDL operates on a single encoder and analytically defines style as the component of a representation that is orthogonal to its style-invariant content, derived via vector projection.
Problem

Research questions and friction points this paper is trying to address.

Overcoming shortcut learning in self-supervised methods
Achieving explicit content-style disentanglement in representations
Addressing superficial feature reliance instead of intrinsic structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid generative-discriminative learning framework
Explicit content-style disentanglement approach
Vector projection for style-invariant representation
🔎 Similar Papers
No similar papers found.
Siming Fu
Siming Fu
Zhejiang University
LLM,Long-tailed learningMulti-modal
S
Sijun Dong
the School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China
X
Xiaoliang Meng
the School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China