🤖 AI Summary
This paper addresses fairness bias in large language models (LLMs) for tabular classification induced by in-context learning. We propose the first context-example selection framework incorporating a dynamic validation mechanism. Our method replaces static validation with a dynamically evolving validation set that adapts to the test distribution, and introduces SMITE—an iterative optimization algorithm that jointly optimizes accuracy and fairness (e.g., demographic parity and equal opportunity) to retrieve optimal in-context examples. Extensive experiments across four state-of-the-art LLMs—GPT-4, Claude-3, Qwen2, and Llama-3—demonstrate that our approach consistently improves classification accuracy (average +2.1%) and multiple fairness metrics (up to +38.5%), while exhibiting strong cross-model robustness. The framework provides a scalable, verifiable technical pathway for responsible LLM deployment in sensitive tabular applications.
📝 Abstract
Large Language Models (LLMs) are widely used for downstream tasks such as tabular classification, where ensuring fairness in their outputs is critical for inclusivity, equal representation, and responsible AI deployment. This study introduces a novel approach to enhancing LLM performance and fairness through the concept of a dynamic validation set, which evolves alongside the test set, replacing the traditional static validation approach. We also propose an iterative algorithm, SMITE, to select optimal in-context examples, with each example set validated against its corresponding dynamic validation set. The in-context set with the lowest total error is used as the final demonstration set. Our experiments across four different LLMs show that our proposed techniques significantly improve both predictive accuracy and fairness compared to baseline methods. To our knowledge, this is the first study to apply dynamic validation in the context of in-context learning for LLMs.