FairPy: A Toolkit for Evaluation of Prediction Biases and their Mitigation in Large Language Models

📅 2023-02-10
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (e.g., BERT, GPT-2) exhibit implicit biases—such as gender and racial stereotyping—in token prediction, primarily stemming from distributional skews in training data. To address this, we introduce FairPy: the first open-source, unified toolkit for fairness assessment and mitigation *during LLM inference*. FairPy adopts a modular architecture supporting diverse algorithms—including polarity/bias score quantification, post-hoc calibration, and adversarial debiasing—and is compatible with Hugging Face Transformers models as well as custom architectures. It bridges a critical gap in the engineering deployment of LLM fairness analysis by enabling automated bias detection and mitigation. Evaluated on standard benchmarks, FairPy significantly reduces sensitive-attribute–correlated prediction disparities. The fully open-sourced implementation has gained broad adoption within the research community.
📝 Abstract
Recent studies have demonstrated that large pretrained language models (LLMs) such as BERT and GPT-2 exhibit biases in token prediction, often inherited from the data distributions present in their training corpora. In response, a number of mathematical frameworks have been proposed to quantify, identify, and mitigate these the likelihood of biased token predictions. In this paper, we present a comprehensive survey of such techniques tailored towards widely used LLMs such as BERT, GPT-2, etc. We additionally introduce Fairpy, a modular and extensible toolkit that provides plug-and-play interfaces for integrating these mathematical tools, enabling users to evaluate both pretrained and custom language models. Fairpy supports the implementation of existing debiasing algorithms. The toolkit is open-source and publicly available at: href{https://github.com/HrishikeshVish/Fairpy}{https://github.com/HrishikeshVish/Fairpy}
Problem

Research questions and friction points this paper is trying to address.

Quantify and mitigate biases in large language models
Evaluate pretrained and custom models for token prediction biases
Provide modular toolkit for implementing debiasing algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular toolkit for bias evaluation
Plug-and-play debiasing algorithm integration
Open-source support for custom LLMs
🔎 Similar Papers