Taxonomy of Comprehensive Safety for Clinical Agents

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Clinical chatbots face severe safety challenges due to insufficient safe-response generation, risking misdiagnosis or harmful outputs. To address this, we propose TACOS—a novel, fine-grained 21-class safety taxonomy specifically designed for clinical conversational agents. TACOS is the first framework to jointly model safety control and tool invocation within user intent recognition, enabling differentiated safety thresholds for clinical versus non-clinical queries and explicit modeling of external tool dependencies. Leveraging a human-annotated TACOS dataset, we develop an end-to-end safety intent classifier and tool router built upon pretrained language models, supporting coordinated multi-level safety policies. Experiments demonstrate substantial improvements in safe-response accuracy. Furthermore, our analysis reveals that both training data distribution and the base model’s prior knowledge critically influence safety performance—highlighting key factors previously underexplored in clinical LLM safety research.

Technology Category

Application Category

📝 Abstract

Safety is a paramount concern in clinical chatbot applications, where inaccurate or harmful responses can lead to serious consequences. Existing methods--such as guardrails and tool calling--often fall short in addressing the nuanced demands of the clinical domain. In this paper, we introduce TACOS (TAxonomy of COmprehensive Safety for Clinical Agents), a fine-grained, 21-class taxonomy that integrates safety filtering and tool selection into a single user intent classification step. TACOS is a taxonomy that can cover a wide spectrum of clinical and non-clinical queries, explicitly modeling varying safety thresholds and external tool dependencies. To validate our framework, we curate a TACOS-annotated dataset and perform extensive experiments. Our results demonstrate the value of a new taxonomy specialized for clinical agent settings, and reveal useful insights about train data distribution and pretrained knowledge of base models.

Problem

Research questions and friction points this paper is trying to address.

Develops safety taxonomy for clinical chatbot applications

Integrates safety filtering with tool selection process

Addresses nuanced safety demands in clinical domain

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained taxonomy for clinical safety classification

Integrates safety filtering with tool selection

Explicitly models safety thresholds and dependencies

🔎 Similar Papers

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?