🤖 AI Summary
Existing safety evaluation benchmarks for Chinese large language models (LLMs) lack dynamism and cultural adaptability, failing to reflect evolving legal, ethical, and societal norms. Method: This paper introduces LiveSecBench—the first multidimensional, dynamic safety evaluation framework tailored to Chinese law and social conventions. It assesses six dimensions: legality, ethics, factual consistency, privacy protection, adversarial robustness, and reasoning safety. Innovations include real-time threat monitoring, versioned benchmark updates, and a human–machine collaborative evaluation pipeline, with built-in extensibility for emerging modalities (e.g., text-to-image generation) and agent-based systems. Contribution/Results: The v251030 release enables systematic evaluation of 18 mainstream Chinese LLMs and hosts a publicly accessible, reproducible real-time leaderboard—significantly enhancing timeliness, objectivity, and practical utility of safety assessment.
📝 Abstract
In this work, we propose LiveSecBench, a dynamic and continuously updated safety benchmark specifically for Chinese-language LLM application scenarios. LiveSecBench evaluates models across six critical dimensions (Legality, Ethics, Factuality, Privacy, Adversarial Robustness, and Reasoning Safety) rooted in the Chinese legal and social frameworks. This benchmark maintains relevance through a dynamic update schedule that incorporates new threat vectors, such as the planned inclusion of Text-to-Image Generation Safety and Agentic Safety in the next update. For now, LiveSecBench (v251030) has evaluated 18 LLMs, providing a landscape of AI safety in the context of Chinese language. The leaderboard is publicly accessible at https://livesecbench.intokentech.cn/.