🤖 AI Summary
The rapid advancement of generative AI—particularly large language models (LLMs)—introduces novel public and national security risks. Method: This paper proposes the first tripartite AI safety architecture, systematically defining AI safety along three dimensions: trustworthiness, accountability, and security. Drawing on interdisciplinary governance perspectives, it establishes an extensible safety analysis paradigm that unifies risk categorization and mitigation pathways. The framework integrates safety evaluation, adversarial testing, explainability analysis, and multi-tiered verification, with empirical design and validation conducted using LLMs. Contribution/Results: It delivers a comprehensive, lifecycle-spanning AI safety guideline, enabling trustworthy deployment in high-risk applications and significantly enhancing public confidence amid digital transformation.
📝 Abstract
AI Safety is an emerging area of critical importance to the safe adoption and deployment of AI systems. With the rapid proliferation of AI and especially with the recent advancement of Generative AI (or GAI), the technology ecosystem behind the design, development, adoption, and deployment of AI systems has drastically changed, broadening the scope of AI Safety to address impacts on public safety and national security. In this paper, we propose a novel architectural framework for understanding and analyzing AI Safety; defining its characteristics from three perspectives: Trustworthy AI, Responsible AI, and Safe AI. We provide an extensive review of current research and advancements in AI safety from these perspectives, highlighting their key challenges and mitigation approaches. Through examples from state-of-the-art technologies, particularly Large Language Models (LLMs), we present innovative mechanism, methodologies, and techniques for designing and testing AI safety. Our goal is to promote advancement in AI safety research, and ultimately enhance people's trust in digital transformation.