The Impact of Inference Acceleration on Bias of LLMs

📅 2024-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work challenges the implicit assumption that inference acceleration techniques—such as quantization, pruning, and KV caching—preserve fairness in large language models (LLMs). We demonstrate that these methods nonlinearly and model-dependently alter demographic biases (e.g., gender, race) in LLM-generated outputs. To this end, we develop a multi-dimensional bias evaluation framework and conduct systematic, cross-model comparisons across several state-of-the-art LLMs under diverse acceleration strategies. Our empirical analysis reveals that bias shifts are substantial, inconsistent, and often unpredictable; certain model-acceleration combinations even exacerbate bias significantly. Crucially, we establish—through rigorous experimentation—the complex, non-additive coupling between acceleration and bias. Our key contribution is the empirical validation of this coupling, leading to the methodological imperative of conducting case-specific bias re-evaluation for each model–acceleration pairing. This finding provides critical guidance for deploying efficient yet equitable LLMs in practice.

Technology Category

Application Category

📝 Abstract
Last few years have seen unprecedented advances in capabilities of Large Language Models (LLMs). These advancements promise to benefit a vast array of application domains. However, due to their immense size, performing inference with LLMs is both costly and slow. Consequently, a plethora of recent work has proposed strategies to enhance inference efficiency, e.g., quantization, pruning, and caching. These acceleration strategies reduce the inference cost and latency, often by several factors, while maintaining much of the predictive performance measured via common benchmarks. In this work, we explore another critical aspect of LLM performance: demographic bias in model generations due to inference acceleration optimizations. Using a wide range of metrics, we probe bias in model outputs from a number of angles. Analysis of outputs before and after inference acceleration shows significant change in bias. Worryingly, these bias effects are complex and unpredictable. A combination of an acceleration strategy and bias type may show little bias change in one model but may lead to a large effect in another. Our results highlight a need for in-depth and case-by-case evaluation of model bias after it has been modified to accelerate inference.
Problem

Research questions and friction points this paper is trying to address.

Inference acceleration impacts LLM bias.
Acceleration strategies unpredictably alter demographic bias.
Case-by-case bias evaluation post-acceleration is essential.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM inference acceleration
bias change analysis
case-by-case evaluation
🔎 Similar Papers
No similar papers found.