SafeGPT: Preventing Data Leakage and Unethical Outputs in Enterprise LLM Use

πŸ“… 2026-01-10
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the critical security and ethical risks arising from enterprise employees inadvertently leaking sensitive data or generating policy-violating, unethical content when using large language models. To mitigate these risks, we propose SafeGPTβ€”the first unified dual-sided protection framework that integrates input-side sensitive information detection and sanitization with output-side content moderation and rewriting. SafeGPT further incorporates a human-in-the-loop feedback mechanism to jointly optimize safety and user experience. By combining red-teaming attacks with reinforcement learning from human feedback, the system significantly reduces the likelihood of data leakage and biased outputs while maintaining high user satisfaction.

Technology Category

Application Category

πŸ“ Abstract
Large Language Models (LLMs) are transforming enterprise workflows but introduce security and ethics challenges when employees inadvertently share confidential data or generate policy-violating content. This paper proposes SafeGPT, a two-sided guardrail system preventing sensitive data leakage and unethical outputs. SafeGPT integrates input-side detection/redaction, output-side moderation/reframing, and human-in-the-loop feedback. Experiments demonstrate SafeGPT effectively reduces data leakage risk and biased outputs while maintaining satisfaction.
Problem

Research questions and friction points this paper is trying to address.

data leakage
unethical outputs
enterprise LLMs
security
ethics
Innovation

Methods, ideas, or system contributions that make the work stand out.

guardrail system
data leakage prevention
output moderation
human-in-the-loop
enterprise LLM security
πŸ”Ž Similar Papers
No similar papers found.
P
Pratyush Desai
Binghamton University
L
Luoxi Tang
Binghamton University
Y
Yuqiao Meng
Binghamton University
Zhaohan Xi
Zhaohan Xi
Binghamton University
AI for ScienceLarge Language ModelsHealthcare AICybersecurityAI Security