The Dual-use Dilemma in LLMs: Do Empowering Ethical Capacities Make a Degraded Utility?

📅 2025-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face an inherent trade-off between safely rejecting harmful chemical queries and maintaining high-quality responses to legitimate ones. Method: We propose a DPO-based collaborative alignment framework, featuring a novel balanced seed data generation mechanism and an LLM-as-judge hybrid evaluation scheme. We construct LibraChemQA—the first chemistry-specific triplet alignment dataset (31.6K samples)—via GPT-assisted three-stage data curation, paraphrase-based augmentation, and domain-adaptive fine-tuning. Contribution/Results: The released LibraChem model achieves +13.44%, +7.16%, and +7.10% improvements over Claude-3, GPT-4o, and LLaMA-3, respectively, on our custom safety–utility benchmark. It significantly mitigates over-refusal while preserving response quality, offering a reproducible methodology and high-quality resources for domain-specific alignment.

Technology Category

Application Category

📝 Abstract
Recent years have witnessed extensive efforts to enhance Large Language Models (LLMs) across various domains, alongside growing attention to their ethical implications. However, a critical challenge remains largely overlooked: LLMs must balance between rejecting harmful requests for safety and accommodating legitimate ones for utility. This paper presents a Direct Preference Optimization (DPO) based alignment framework that achieves better overall performance by addressing this ethical-utility trade-off, using chemical domain applications as a proof-of-concept. Our alignment pipeline starts with a GPT-assisted three-phase data generation scheme, in which we create LibraChemQA, a chemical question-answering dataset comprising 31.6k triplet instances. By incorporating an innovative balanced seed in the data generation process, our framework systematically considers both legitimate and illegitimate requests. The framework also introduces a rephrasing mechanism for efficient data augmentation that enhances the model's chemical comprehension. We further develop a novel hybrid evaluation scheme with LLM judges for precise assessment of both safety and utility. Experimental results demonstrate our model's substantial improvements in overall performance where both safety and utility are considered - our resulting model, LibraChem, outperforms leading LLMs including Claude-3, GPT-4o, and LLaMA-3 by margins of 13.44%, 7.16%, and 7.10% respectively on our released benchmark.
Problem

Research questions and friction points this paper is trying to address.

Ethical Safety
Practical Efficiency
Chemical Domain Application
Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct Preference Optimization
Chemical Question-Answering Dataset
Safety-Utility Balance
🔎 Similar Papers
No similar papers found.