Attention Pruning: Automated Fairness Repair of Language Models via Surrogate Simulated Annealing

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

246K/year

🤖 AI Summary

This work addresses social fairness issues in large language models (LLMs) arising from inherent biases in training data, proposing a post-hoc attention head pruning method that requires neither retraining nor architectural modification. To tackle the combinatorial optimization challenge of efficiently identifying bias sources within ultra-high-dimensional parameter spaces, the authors introduce a novel proxy-model-based randomized simulated annealing search framework. This framework transforms the computationally expensive LLM fairness optimization into a lightweight multi-objective combinatorial optimization problem over a surrogate network, jointly optimizing for fairness improvement and language capability preservation. Experiments across multiple benchmarks demonstrate that the method achieves up to 40% reduction in gender bias—substantially outperforming existing state-of-the-art approaches—while preserving model utility almost entirely.

Technology Category

Application Category

📝 Abstract

This paper explores pruning attention heads as a post-processing bias mitigation method for large language models (LLMs). Modern AI systems such as LLMs are expanding into sensitive social contexts where fairness concerns become especially crucial. Since LLMs develop decision-making patterns by training on massive datasets of human-generated content, they naturally encode and perpetuate societal biases. While modifying training datasets and algorithms is expensive and requires significant resources; post-processing techniques-such as selectively deactivating neurons and attention heads in pre-trained LLMs-can provide feasible and effective approaches to improve fairness. However, identifying the optimal subset of parameters to prune presents a combinatorial challenge within LLMs' immense parameter space, requiring solutions that efficiently balance competing objectives across the frontiers of model fairness and utility. To address the computational challenges, we explore a search-based program repair approach via randomized simulated annealing. Given the prohibitive evaluation costs in billion-parameter LLMs, we develop surrogate deep neural networks that efficiently model the relationship between attention head states (active/inactive) and their corresponding fairness/utility metrics. This allows us to perform optimization over the surrogate models and efficiently identify optimal subsets of attention heads for selective pruning rather than directly searching through the LLM parameter space. This paper introduces Attention Pruning, a fairness-aware surrogate simulated annealing approach to prune attention heads in LLMs that disproportionately contribute to bias while minimally impacting overall model utility. Our experiments show that Attention Pruning achieves up to $40%$ reduction in gender bias and outperforms the state-of-the-art bias mitigation strategies.

Problem

Research questions and friction points this paper is trying to address.

Mitigates bias in large language models via attention head pruning.

Addresses fairness-utility trade-offs in billion-parameter LLMs.

Uses surrogate models to optimize pruning for bias reduction.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Surrogate simulated annealing for bias mitigation

Attention head pruning in large language models

Efficient fairness-utility optimization via surrogate models

🔎 Similar Papers

Collapsed Language Models Promote Fairness