🤖 AI Summary
In algorithmic hiring, language models often generate job descriptions containing implicit gender and racial biases, undermining fairness and diversity. To address this, we propose AutoRefine—a novel method that introduces reinforcement learning for targeted fine-tuning guided by downstream, measurable fairness metrics (e.g., diversity match rate), eliminating the need for human feedback. AutoRefine integrates bias detection, lightweight fine-tuning of large language models, and seamless integration with recommendation systems. Evaluated on public benchmarks and a real-world recruitment platform, it achieves significant improvements in gender and racial diversity match rates (+18.7% on average) while strictly preserving job relevance. Our core contribution is an end-to-end, quantifiable, human-feedback-free fairness alignment framework—establishing an efficient, production-ready paradigm for bias mitigation in algorithmic hiring.
📝 Abstract
Foundation models require fine-tuning to ensure their generative outputs align with intended results for specific tasks. Automating this fine-tuning process is challenging, as it typically needs human feedback that can be expensive to acquire. We present AutoRefine, a method that leverages reinforcement learning for targeted fine-tuning, utilizing direct feedback from measurable performance improvements in specific downstream tasks. We demonstrate the method for a problem arising in algorithmic hiring platforms where linguistic biases influence a recommendation system. In this setting, a generative model seeks to rewrite given job specifications to receive more diverse candidate matches from a recommendation engine which matches jobs to candidates. Our model detects and regulates biases in job descriptions to meet diversity and fairness criteria. The experiments on a public hiring dataset and a real-world hiring platform showcase how large language models can assist in identifying and mitigation biases in the real world.