Auditing LLMs for Algorithmic Fairness in Casenote-Augmented Tabular Prediction

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This study addresses the underexplored issue of algorithmic fairness in large language models (LLMs) when applied to tabular prediction tasks in high-stakes social service settings, particularly when incorporating street outreach case notes that may exacerbate disparities in classification errors. The authors present the first systematic audit of LLM fairness in a real-world housing placement prediction task, integrating case summaries provided by a nonprofit organization and comparing fine-tuned against zero-shot approaches. Results demonstrate that fine-tuning LLMs with case summaries not only significantly improves predictive accuracy but also effectively narrows fairness gaps across demographic groups. In contrast, zero-shot methods, while avoiding additional textual bias, offer limited gains in fairness. This work provides empirical evidence and a practical technical pathway for the responsible deployment of LLMs in social services.

Technology Category

Application Category

📝 Abstract

LLMs are increasingly being considered for prediction tasks in high-stakes social service settings, but their algorithmic fairness properties in this context are poorly understood. In this short technical report, we audit the algorithmic fairness of LLM-based tabular classification on a real housing placement prediction task, augmented with street outreach casenotes from a nonprofit partner. We audit multi-class classification error disparities. We find that a fine-tuned model augmented with casenote summaries can improve accuracy while reducing algorithmic fairness disparities. We experiment with variable importance improvements to zero-shot tabular classification and find mixed results on resulting algorithmic fairness. Overall, given historical inequities in housing placement, it is crucial to audit LLM use. We find that leveraging LLMs to augment tabular classification with casenote summaries can safely leverage additional text information at low implementation burden. The outreach casenotes are fairly short and heavily redacted. Our assessment is that LLM zero-shot classification does not introduce additional textual biases beyond algorithmic biases in tabular classification. Combining fine-tuning and leveraging casenote summaries can improve accuracy and algorithmic fairness.

Problem

Research questions and friction points this paper is trying to address.

algorithmic fairness

LLMs

tabular prediction

casenote augmentation

classification disparities

Innovation

Methods, ideas, or system contributions that make the work stand out.

algorithmic fairness

LLM-augmented tabular prediction

casenote summarization