🤖 AI Summary
Large language models (LLMs) often inherit spurious correlations and framing biases from training data, leading to unfair text summarization outputs. To address this, we propose a domain-agnostic adversarial training framework that enables universal debiasing of abstractive summarization without requiring bias labels or domain-specific knowledge. Our method, built upon a Seq2Seq architecture, introduces a novel Embedding-layer Gradient-guided Perturber (EGP) module that injects targeted perturbations into token embeddings via gradient-based guidance; integrates an adversarial loss to suppress bias-confounded representations; and employs a multi-dimensional bias evaluation protocol for robust assessment. Experiments across multiple benchmarks demonstrate significant mitigation of name–nationality and political stance biases, while preserving ROUGE scores—outperforming standard Transformer baselines and back-translation–enhanced methods.
📝 Abstract
Large Language Models (LLMs) have achieved impressive performance in text summarization and are increasingly deployed in real-world applications. However, these systems often inherit associative and framing biases from pre-training data, leading to inappropriate or unfair outputs in downstream tasks. In this work, we present AdvSumm (Adversarial Summarization), a domain-agnostic training framework designed to mitigate bias in text summarization through improved generalization. Inspired by adversarial robustness, AdvSumm introduces a novel Perturber component that applies gradient-guided perturbations at the embedding level of Sequence-to-Sequence models, enhancing the model's robustness to input variations. We empirically demonstrate that AdvSumm effectively reduces different types of bias in summarization-specifically, name-nationality bias and political framing bias-without compromising summarization quality. Compared to standard transformers and data augmentation techniques like back-translation, AdvSumm achieves stronger bias mitigation performance across benchmark datasets.