SafeSci: Safety Evaluation of Large Language Models in Science Domains and Beyond

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the limitations of existing safety evaluation benchmarks for large language models in scientific domains, which often suffer from insufficient risk coverage and reliance on subjective judgments. To this end, we propose SafeSci, a comprehensive framework comprising the multidisciplinary safety benchmark SafeSciBench and the safety-enhanced training dataset SafeSciTrain. SafeSci introduces a novel evaluation paradigm that explicitly distinguishes between safe knowledge and hazardous content, employs verifiable questions to minimize assessment bias, and emphasizes that scientific safety must be context-dependent. Using objective metrics, we evaluate 24 prominent large language models, uncovering critical safety vulnerabilities alongside tendencies toward excessive refusal. Furthermore, fine-tuning with SafeSciTrain substantially improves models’ safety alignment in scientific contexts.

Technology Category

Application Category

📝 Abstract

The success of large language models (LLMs) in scientific domains has heightened safety concerns, prompting numerous benchmarks to evaluate their scientific safety. Existing benchmarks often suffer from limited risk coverage and a reliance on subjective evaluation. To address these problems, we introduce SafeSci, a comprehensive framework for safety evaluation and enhancement in scientific contexts. SafeSci comprises SafeSciBench, a multi-disciplinary benchmark with 0.25M samples, and SafeSciTrain, a large-scale dataset containing 1.5M samples for safety enhancement. SafeSciBench distinguishes between safety knowledge and risk to cover extensive scopes and employs objective metrics such as deterministically answerable questions to mitigate evaluation bias. We evaluate 24 advanced LLMs, revealing critical vulnerabilities in current models. We also observe that LLMs exhibit varying degrees of excessive refusal behaviors on safety-related issues. For safety enhancement, we demonstrate that fine-tuning on SafeSciTrain significantly enhances the safety alignment of models. Finally, we argue that knowledge is a double-edged sword, and determining the safety of a scientific question should depend on specific context, rather than universally categorizing it as safe or unsafe. Our work provides both a diagnostic tool and a practical resource for building safer scientific AI systems.

Problem

Research questions and friction points this paper is trying to address.

safety evaluation

large language models

scientific domains

benchmark limitations

evaluation bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

SafeSci

safety evaluation

objective metrics

safety alignment

context-dependent safety

🔎 Similar Papers

No similar papers found.