🤖 AI Summary
Large language models (e.g., BERT, GPT-2) exhibit implicit biases—such as gender and racial stereotyping—in token prediction, primarily stemming from distributional skews in training data. To address this, we introduce FairPy: the first open-source, unified toolkit for fairness assessment and mitigation *during LLM inference*. FairPy adopts a modular architecture supporting diverse algorithms—including polarity/bias score quantification, post-hoc calibration, and adversarial debiasing—and is compatible with Hugging Face Transformers models as well as custom architectures. It bridges a critical gap in the engineering deployment of LLM fairness analysis by enabling automated bias detection and mitigation. Evaluated on standard benchmarks, FairPy significantly reduces sensitive-attribute–correlated prediction disparities. The fully open-sourced implementation has gained broad adoption within the research community.
📝 Abstract
Recent studies have demonstrated that large pretrained language models (LLMs) such as BERT and GPT-2 exhibit biases in token prediction, often inherited from the data distributions present in their training corpora. In response, a number of mathematical frameworks have been proposed to quantify, identify, and mitigate these the likelihood of biased token predictions. In this paper, we present a comprehensive survey of such techniques tailored towards widely used LLMs such as BERT, GPT-2, etc. We additionally introduce Fairpy, a modular and extensible toolkit that provides plug-and-play interfaces for integrating these mathematical tools, enabling users to evaluate both pretrained and custom language models. Fairpy supports the implementation of existing debiasing algorithms. The toolkit is open-source and publicly available at: href{https://github.com/HrishikeshVish/Fairpy}{https://github.com/HrishikeshVish/Fairpy}