What Are They Filtering Out? A Survey of Filtering Strategies for Harm Reduction in Pretraining Datasets

📅 2025-02-17

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Pretraining data filtering strategies intended to reduce harmful content inadvertently exacerbate representational underrepresentation of marginalized groups, thereby amplifying demographic bias at the data level. Method: We systematically reviewed 55 English-language large language model technical reports to construct the first integrated data governance evaluation framework balancing safety and fairness. Through controlled experiments and quantitative bias analysis across mainstream filtering strategies, we measured their impact on group-level representation. Contribution/Results: Our analysis reveals that such strategies reduce text associated with disadvantaged groups by 12.7%–38.4% on average—significantly worsening representational disparity. This study provides the first empirical evidence refuting the “safety implies fairness” assumption in AI governance. We propose a co-optimization paradigm that jointly addresses content safety and equitable group representation, advocating for fairness-aware data curation in foundation model development.

Technology Category

Application Category

📝 Abstract

Data filtering strategies are a crucial component to develop safe Large Language Models (LLM), since they support the removal of harmful contents from pretraining datasets. There is a lack of research on the actual impact of these strategies on vulnerable groups to discrimination, though, and their effectiveness has not been yet systematically addressed. In this paper we present a benchmark study of data filtering strategies for harm reduction aimed at providing a systematic overview on these approaches. We survey 55 technical reports of English LMs and LLMs to identify the existing filtering strategies in literature and implement an experimental setting to test their impact against vulnerable groups. Our results show that the positive impact that strategies have in reducing harmful contents from documents has the side effect of increasing the underrepresentation of vulnerable groups to discrimination in datasets.

Problem

Research questions and friction points this paper is trying to address.

Evaluating data filtering strategies for harm reduction in LLMs

Assessing filtering impact on vulnerable groups' representation

Analyzing side effects of content removal on dataset diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking data filtering strategies for harm reduction

Evaluating impact on vulnerable groups in datasets

Analyzing side effects of content filtering methods

🔎 Similar Papers

An Efficient Rehearsal Scheme for Catastrophic Forgetting Mitigation during Multi-stage Fine-tuning