🤖 AI Summary
Existing fairness-aware machine learning research suffers from limited dataset availability, ad hoc dataset selection, inconsistent preprocessing, and insufficient metadata—resulting in poor generalizability and low reproducibility. To address these challenges, we introduce FairGround: the first unified data framework specifically designed for algorithmic fairness research. FairGround comprises 44 diverse, tabular datasets spanning multiple domains, each richly annotated with fine-grained fairness-related metadata—including formal definitions of sensitive attributes, bias types, and socio-contextual information. Complementing the framework is an open-source Python toolkit that enables standardized data loading, configurable preprocessing, principled train/validation/test splits, and end-to-end reproducible experimentation. By providing consistent, transparent, and well-documented resources, FairGround significantly enhances the consistency, comparability, and reproducibility of fairness evaluations. It thereby advances fair machine learning toward greater methodological rigor, transparency, and standardization.
📝 Abstract
As machine learning (ML) systems are increasingly adopted in high-stakes decision-making domains, ensuring fairness in their outputs has become a central challenge. At the core of fair ML research are the datasets used to investigate bias and develop mitigation strategies. Yet, much of the existing work relies on a narrow selection of datasets--often arbitrarily chosen, inconsistently processed, and lacking in diversity--undermining the generalizability and reproducibility of results.
To address these limitations, we present FairGround: a unified framework, data corpus, and Python package aimed at advancing reproducible research and critical data studies in fair ML classification. FairGround currently comprises 44 tabular datasets, each annotated with rich fairness-relevant metadata. Our accompanying Python package standardizes dataset loading, preprocessing, transformation, and splitting, streamlining experimental workflows. By providing a diverse and well-documented dataset corpus along with robust tooling, FairGround enables the development of fairer, more reliable, and more reproducible ML models. All resources are publicly available to support open and collaborative research.