Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning

📅 2025-10-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited robustness of harmful fine-tuning defenses in “fine-tuning-as-a-service” for large language models—stemming from low-quality safety-aligned training data—this paper proposes Pharmacist. We first identify and empirically demonstrate that alignment data quality critically constrains defense robustness. Pharmacist introduces a learnable data selector that dynamically identifies high-quality, safety-critical samples while suppressing low-quality and non-safety-relevant ones. Compatible with state-of-the-art defenses such as RepNoise and T-Vaccine, Pharmacist achieves consistent improvements across multiple benchmarks: +2.60–3.30% defense accuracy, +1.10–3.50% inference speedup, and −56.8–57.6% reduction in training time—substantially outperforming existing data selection approaches. Our core contribution lies in integrating explicit data quality modeling into the alignment-based defense framework, enabling efficient, robust, and plug-and-play enhancement of secure fine-tuning.

Technology Category

Application Category

📝 Abstract
Harmful fine-tuning issues present significant safety challenges for fine-tuning-as-a-service in large language models. Existing alignment-stage defenses, e.g., Vaccine, Repnoise, Booster, and T-Vaccine, mitigate harmful fine-tuning issues by enhancing the model's robustness during the alignment phase. While these methods have been proposed to mitigate the issue, they often overlook a critical upstream factor: the role of the original safety-alignment data. We observe that their defense performance and computational efficiency remain constrained by the quality and composition of the alignment dataset. To address this limitation, we propose Pharmacist, a safety alignment data curation solution that enhances defense against harmful fine-tuning by selecting a high-quality and safety-critical core subset from the original alignment data. The core idea of Pharmacist is to train an alignment data selector to rank alignment data. Specifically, up-ranking high-quality and safety-critical alignment data, down-ranking low-quality and non-safety-critical data. Empirical results indicate that models trained on datasets selected by Pharmacist outperform those trained on datasets selected by existing selection methods in both defense and inference performance. In addition, Pharmacist can be effectively integrated with mainstream alignment-stage defense methods. For example, when applied to RepNoise and T-Vaccine, using the dataset selected by Pharmacist instead of the full dataset leads to improvements in defense performance by 2.60% and 3.30%, respectively, and enhances inference performance by 3.50% and 1.10%. Notably, it reduces training time by 56.83% and 57.63%, respectively. Our code is available at https://github.com/Lslland/Pharmacist.
Problem

Research questions and friction points this paper is trying to address.

Addresses harmful fine-tuning vulnerabilities in large language models
Improves safety alignment data quality for enhanced defense performance
Reduces computational costs while maintaining model inference capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curates safety alignment data by selecting critical core subsets
Ranks alignment data to prioritize safety-critical examples
Integrates with existing defenses to enhance performance efficiency
🔎 Similar Papers
No similar papers found.
G
Guozhi Liu
School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China
Q
Qi Mu
School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China, and also with the IEIT SYSTEMS Co., Ltd., China
Tiansheng Huang
Tiansheng Huang
Georgia Institute of Technology
Parallel and Distributed ComputingDistributed machine learningLLM safety
X
Xinhua Wang
School of Computer Science and Engineering, South China University of Technology, Guangzhou, Guangdong 510006, China
L
Li Shen
School of Cyber Science and Technology, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China
Weiwei Lin
Weiwei Lin
School of Physics, Southeast University
Condensed matter physicsmaterial sciencenanotechnologymagnetismspintronics
Z
Zhang Li
Second Affiliated Hospital of Guangzhou University of Chinese Medicine, Guangzhou, Guangdong 510120, China