aiXamine: LLM Safety and Security Simplified

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

LLM security evaluation has long suffered from heterogeneous benchmarks, fragmented metrics, and inconsistent reporting. Method: We propose the first unified black-box evaluation platform for LLM safety, introducing a novel multidimensional service-oriented assessment framework that integrates 40+ test cases across eight critical dimensions—adversarial robustness, bias, privacy, and others. The platform standardizes heterogeneous benchmarks, metrics, and report formats, and supports automated structured reporting, interpretability visualization, and cross-model attribution analysis. Contribution/Results: We conducted over 2,000 evaluations across 50+ mainstream models. Our analysis reveals significant adversarial vulnerabilities and privacy leakage in state-of-the-art closed-source models (e.g., GPT-4o), while certain open-source models outperform them on safety alignment and related dimensions. These findings provide empirical evidence for informed trade-offs in LLM safety design and deployment.

Technology Category

Application Category

📝 Abstract

Evaluating Large Language Models (LLMs) for safety and security remains a complex task, often requiring users to navigate a fragmented landscape of ad hoc benchmarks, datasets, metrics, and reporting formats. To address this challenge, we present aiXamine, a comprehensive black-box evaluation platform for LLM safety and security. aiXamine integrates over 40 tests (i.e., benchmarks) organized into eight key services targeting specific dimensions of safety and security: adversarial robustness, code security, fairness and bias, hallucination, model and data privacy, out-of-distribution (OOD) robustness, over-refusal, and safety alignment. The platform aggregates the evaluation results into a single detailed report per model, providing a detailed breakdown of model performance, test examples, and rich visualizations. We used aiXamine to assess over 50 publicly available and proprietary LLMs, conducting over 2K examinations. Our findings reveal notable vulnerabilities in leading models, including susceptibility to adversarial attacks in OpenAI's GPT-4o, biased outputs in xAI's Grok-3, and privacy weaknesses in Google's Gemini 2.0. Additionally, we observe that open-source models can match or exceed proprietary models in specific services such as safety alignment, fairness and bias, and OOD robustness. Finally, we identify trade-offs between distillation strategies, model size, training methods, and architectural choices.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM safety and security is complex and fragmented

aiXamine integrates 40+ tests for comprehensive LLM assessment

Identifies vulnerabilities in leading models and performance trade-offs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive black-box evaluation platform

Integrates over 40 tests across eight services

Aggregates results into detailed report with visualizations

🔎 Similar Papers

No similar papers found.