LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing safety evaluation benchmarks for Chinese large language models (LLMs) lack dynamism and cultural adaptability, failing to reflect evolving legal, ethical, and societal norms. Method: This paper introduces LiveSecBench—the first multidimensional, dynamic safety evaluation framework tailored to Chinese law and social conventions. It assesses six dimensions: legality, ethics, factual consistency, privacy protection, adversarial robustness, and reasoning safety. Innovations include real-time threat monitoring, versioned benchmark updates, and a human–machine collaborative evaluation pipeline, with built-in extensibility for emerging modalities (e.g., text-to-image generation) and agent-based systems. Contribution/Results: The v251030 release enables systematic evaluation of 18 mainstream Chinese LLMs and hosts a publicly accessible, reproducible real-time leaderboard—significantly enhancing timeliness, objectivity, and practical utility of safety assessment.

Technology Category

Application Category

📝 Abstract

In this work, we propose LiveSecBench, a dynamic and continuously updated safety benchmark specifically for Chinese-language LLM application scenarios. LiveSecBench evaluates models across six critical dimensions (Legality, Ethics, Factuality, Privacy, Adversarial Robustness, and Reasoning Safety) rooted in the Chinese legal and social frameworks. This benchmark maintains relevance through a dynamic update schedule that incorporates new threat vectors, such as the planned inclusion of Text-to-Image Generation Safety and Agentic Safety in the next update. For now, LiveSecBench (v251030) has evaluated 18 LLMs, providing a landscape of AI safety in the context of Chinese language. The leaderboard is publicly accessible at https://livesecbench.intokentech.cn/.

Problem

Research questions and friction points this paper is trying to address.

Evaluates AI safety across six dimensions in Chinese legal frameworks

Dynamically updates to incorporate emerging threats like text-to-image safety

Assesses 18 LLMs for security risks in Chinese language contexts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic safety benchmark for Chinese LLM scenarios

Evaluates six dimensions within Chinese legal frameworks

Continuously updates to include emerging threat vectors

🔎 Similar Papers

No similar papers found.

Authors to Follow