SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language model (LLM)-based agents pose significant ethical and safety risks in accelerating scientific discovery. To address this, we propose the “Risk-Aware AI Scientist” framework—a novel, end-to-end safety defense architecture for AI-driven scientific research. Our approach introduces SciSafetyBench, the first safety evaluation benchmark tailored to scientific workflows, comprising 240 high-risk tasks, 30 scientific tools, and 120 tool-use risk scenarios. The framework integrates four core components: prompt-level monitoring, multi-agent collaborative oversight, tool invocation auditing, and dynamic ethical review, augmented by adversarial robustness validation. Experimental results demonstrate a 35% improvement in safety performance—measured via risk detection and mitigation accuracy—while preserving scientific output quality with zero degradation. Moreover, the framework exhibits strong robustness against diverse adversarial attacks, including jailbreaking, prompt injection, and tool misuse. This work establishes a foundational methodology for trustworthy, safety-first AI-assisted science.

Technology Category

Application Category

📝 Abstract
Recent advancements in large language model (LLM) agents have significantly accelerated scientific discovery automation, yet concurrently raised critical ethical and safety concerns. To systematically address these challenges, we introduce extbf{SafeScientist}, an innovative AI scientist framework explicitly designed to enhance safety and ethical responsibility in AI-driven scientific exploration. SafeScientist proactively refuses ethically inappropriate or high-risk tasks and rigorously emphasizes safety throughout the research process. To achieve comprehensive safety oversight, we integrate multiple defensive mechanisms, including prompt monitoring, agent-collaboration monitoring, tool-use monitoring, and an ethical reviewer component. Complementing SafeScientist, we propose extbf{SciSafetyBench}, a novel benchmark specifically designed to evaluate AI safety in scientific contexts, comprising 240 high-risk scientific tasks across 6 domains, alongside 30 specially designed scientific tools and 120 tool-related risk tasks. Extensive experiments demonstrate that SafeScientist significantly improves safety performance by 35% compared to traditional AI scientist frameworks, without compromising scientific output quality. Additionally, we rigorously validate the robustness of our safety pipeline against diverse adversarial attack methods, further confirming the effectiveness of our integrated approach. The code and data will be available at https://github.com/ulab-uiuc/SafeScientist. extcolor{red}{Warning: this paper contains example data that may be offensive or harmful.}
Problem

Research questions and friction points this paper is trying to address.

Addressing ethical and safety concerns in AI-driven scientific discovery
Enhancing risk-aware decision-making for LLM agents in research
Developing safety benchmarks for high-risk scientific AI tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proactive refusal of unethical high-risk tasks
Multiple defensive mechanisms for safety oversight
Novel benchmark for evaluating AI safety
🔎 Similar Papers
No similar papers found.
Kunlun Zhu
Kunlun Zhu
University of Illinois at Urbana-Champaign
Large Language ModelsFoundation AgentsAgents for ScienceAgents Safety
Jiaxun Zhang
Jiaxun Zhang
PhD, University of Macau
Autonomous drivingIntelligent TransportationTraffic Safety
Z
Ziheng Qi
University of Illinois Urbana-Champaign
N
Nuoxing Shang
University of Illinois Urbana-Champaign
Z
Zijia Liu
University of Illinois Urbana-Champaign
P
Peixuan Han
University of Illinois Urbana-Champaign
Y
Yue Su
University of Illinois Urbana-Champaign
H
Haofei Yu
University of Illinois Urbana-Champaign
Jiaxuan You
Jiaxuan You
Assistant Professor, UIUC CS
Foundation ModelsGNNLarge Language Models