Investigating the Capabilities and Limitations of Machine Learning for Identifying Bias in English Language Data with Information and Heritage Professionals

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This paper investigates the capabilities and limitations of machine learning in identifying biased language within English-language corpora, specifically within information and cultural heritage workflows. Departing from dominant “debiasing” paradigms, it reframes bias identification as an interpretability task—prioritizing the *exposure* rather than elimination of data bias. Method: It introduces a context-sensitive annotation framework and a human-AI collaborative evaluation protocol, integrating BERT fine-tuning, qualitative workshops, and mixed-methods analysis to rigorously assess feasibility and ethical trade-offs. Contribution/Results: Findings reveal that bias is highly context-dependent and thus resistant to universal ML detection; debiasing interventions may simultaneously empower and marginalize different groups; and bias is structurally embedded and inherently unavoidable. The work advances fairness research from idealized algorithmic optimization toward domain-adapted, value-sensitive governance frameworks grounded in interpretability and situated practice.

Technology Category

Application Category

📝 Abstract

Despite numerous efforts to mitigate their biases, ML systems continue to harm already-marginalized people. While predominant ML approaches assume bias can be removed and fair models can be created, we show that these are not always possible, nor desirable, goals. We reframe the problem of ML bias by creating models to identify biased language, drawing attention to a dataset's biases rather than trying to remove them. Then, through a workshop, we evaluated the models for a specific use case: workflows of information and heritage professionals. Our findings demonstrate the limitations of ML for identifying bias due to its contextual nature, the way in which approaches to mitigating it can simultaneously privilege and oppress different communities, and its inevitability. We demonstrate the need to expand ML approaches to bias and fairness, providing a mixed-methods approach to investigating the feasibility of removing bias or achieving fairness in a given ML use case.

Problem

Research questions and friction points this paper is trying to address.

Examining ML's ability to detect bias in English language data

Challenging the assumption that bias can be fully removed from ML

Evaluating ML limitations in identifying contextual biases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Models identify biased language without removal

Workshop evaluates models for professional workflows

Mixed-methods approach assesses bias feasibility

🔎 Similar Papers

Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance