IRepair: An Intent-Aware Approach to Repair Data-Driven Errors in Large Language Models

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Large language models (LLMs) are susceptible to biases in training data, leading to toxic generation; existing domain adaptation approaches often rely on full-parameter fine-tuning, resulting in low repair precision and degradation of model diversity. To address this, we propose an intent-aware dynamic hierarchical repair method that combines dynamic layer-wise sensitivity analysis with intent-driven error localization to precisely identify and re-optimize the top 20% most sensitive layers—where error density is 773% higher—under preference-alignment constraints for targeted correction. Evaluated on 800M–1.6B parameter GPT-2 and GPT-Neo models, our method improves error repair efficiency by 43.6% over direct preference optimization baselines while reducing perturbation to general capabilities by 46%. This work achieves, for the first time, high-precision, low-interference, data-driven error correction in LLMs.

Technology Category

Application Category

📝 Abstract

Not a day goes by without hearing about the impressive feats of large language models (LLMs), and equally, not a day passes without hearing about their challenges. LLMs are notoriously vulnerable to biases in their dataset, leading to issues such as toxicity. While domain-adaptive training has been employed to mitigate these issues, these techniques often address all model parameters indiscriminately during the repair process, resulting in poor repair quality and reduced model versatility. In this paper, we introduce a novel dynamic slicing-based intent-aware LLM repair strategy, IRepair. This approach selectively targets the most error-prone sections of the model for repair. Specifically, we propose dynamically slicing the model's most sensitive layers that require immediate attention, concentrating repair efforts on those areas. This method enables more effective repairs with potentially less impact on the model's overall performance by altering a smaller portion of the model. We evaluated our technique on three models from the GPT2 and GPT-Neo families, with parameters ranging from 800M to 1.6B, in a toxicity mitigation setup. Our results show that IRepair repairs errors 43.6% more effectively while causing 46% less disruption to general performance compared to the closest baseline, direct preference optimization. Our empirical analysis also reveals that errors are more concentrated in a smaller section of the model, with the top 20% of layers exhibiting 773% more error density than the remaining 80%. This highlights the need for selective repair. Additionally, we demonstrate that a dynamic selection approach is essential for addressing errors dispersed throughout the model, ensuring a robust and efficient repair.

Problem

Research questions and friction points this paper is trying to address.

Repair data-driven errors in LLMs

Selectively target error-prone model sections

Improve repair quality and model versatility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic slicing-based repair strategy

Selective error-prone section targeting

Intent-aware LLM repair approach

🔎 Similar Papers

Healing Powers of BERT: How Task-Specific Fine-Tuning Recovers Corrupted Language Models