CoBA: Counterbias Text Augmentation for Mitigating Various Spurious Correlations via Semantic Triples

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Deep learning models often rely on spurious correlations in training data—such as gender or simplicity biases—leading to poor out-of-distribution (OOD) generalization. To address this, we propose CoBA, a unified counter-bias augmentation framework that, for the first time, jointly mitigates multiple types of spurious associations at the semantic triplet level. CoBA decomposes and reconstructs text into triplets and applies selective modification to generate adversarial augmented samples that break non-target correlations. It requires no additional annotations or model modifications and is plug-and-play. Extensive experiments across diverse natural language understanding tasks demonstrate that CoBA significantly improves OOD robustness, effectively alleviates multiple biases (e.g., gender, lexical, and syntactic), and yields stable, generalizable performance gains. These results empirically validate the efficacy of semantic-structured augmentation in disentangling spurious correlations.

Technology Category

Application Category

📝 Abstract

Deep learning models often learn and exploit spurious correlations in training data, using these non-target features to inform their predictions. Such reliance leads to performance degradation and poor generalization on unseen data. To address these limitations, we introduce a more general form of counterfactual data augmentation, termed counterbias data augmentation, which simultaneously tackles multiple biases (e.g., gender bias, simplicity bias) and enhances out-of-distribution robustness. We present CoBA: CounterBias Augmentation, a unified framework that operates at the semantic triple level: first decomposing text into subject-predicate-object triples, then selectively modifying these triples to disrupt spurious correlations. By reconstructing the text from these adjusted triples, CoBA generates counterbias data that mitigates spurious patterns. Through extensive experiments, we demonstrate that CoBA not only improves downstream task performance, but also effectively reduces biases and strengthens out-of-distribution resilience, offering a versatile and robust solution to the challenges posed by spurious correlations.

Problem

Research questions and friction points this paper is trying to address.

Mitigates spurious correlations in training data

Enhances out-of-distribution generalization robustness

Simultaneously tackles multiple biases via semantic augmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic triple-level text decomposition and modification

Generating counterbias data via triple reconstruction

Simultaneously mitigating multiple biases for robustness

🔎 Similar Papers

No similar papers found.