IRepair: An Intent-Aware Approach to Repair Data-Driven Errors in Large Language Models

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) are susceptible to biases in training data, leading to toxic generation; existing domain adaptation approaches often rely on full-parameter fine-tuning, resulting in low repair precision and degradation of model diversity. To address this, we propose an intent-aware dynamic hierarchical repair method that combines dynamic layer-wise sensitivity analysis with intent-driven error localization to precisely identify and re-optimize the top 20% most sensitive layers—where error density is 773% higher—under preference-alignment constraints for targeted correction. Evaluated on 800M–1.6B parameter GPT-2 and GPT-Neo models, our method improves error repair efficiency by 43.6% over direct preference optimization baselines while reducing perturbation to general capabilities by 46%. This work achieves, for the first time, high-precision, low-interference, data-driven error correction in LLMs.

Technology Category

Application Category

📝 Abstract
Not a day goes by without hearing about the impressive feats of large language models (LLMs), and equally, not a day passes without hearing about their challenges. LLMs are notoriously vulnerable to biases in their dataset, leading to issues such as toxicity. While domain-adaptive training has been employed to mitigate these issues, these techniques often address all model parameters indiscriminately during the repair process, resulting in poor repair quality and reduced model versatility. In this paper, we introduce a novel dynamic slicing-based intent-aware LLM repair strategy, IRepair. This approach selectively targets the most error-prone sections of the model for repair. Specifically, we propose dynamically slicing the model's most sensitive layers that require immediate attention, concentrating repair efforts on those areas. This method enables more effective repairs with potentially less impact on the model's overall performance by altering a smaller portion of the model. We evaluated our technique on three models from the GPT2 and GPT-Neo families, with parameters ranging from 800M to 1.6B, in a toxicity mitigation setup. Our results show that IRepair repairs errors 43.6% more effectively while causing 46% less disruption to general performance compared to the closest baseline, direct preference optimization. Our empirical analysis also reveals that errors are more concentrated in a smaller section of the model, with the top 20% of layers exhibiting 773% more error density than the remaining 80%. This highlights the need for selective repair. Additionally, we demonstrate that a dynamic selection approach is essential for addressing errors dispersed throughout the model, ensuring a robust and efficient repair.
Problem

Research questions and friction points this paper is trying to address.

Repair data-driven errors in LLMs
Selectively target error-prone model sections
Improve repair quality and model versatility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic slicing-based repair strategy
Selective error-prone section targeting
Intent-aware LLM repair approach
🔎 Similar Papers
No similar papers found.