A Safety Report on GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

It remains unclear whether the safety of large language and multimodal models improves in tandem with their capabilities, and a unified, cross-modal evaluation framework is currently lacking. This work proposes the first cross-modal, multidimensional safety assessment framework that integrates benchmarking, adversarial attacks, multilingual robustness, and regulatory compliance evaluations to construct comprehensive safety profiles for seven state-of-the-art models across text, vision-language, and image generation tasks. The results reveal significant heterogeneity and trade-offs in safety performance across dimensions: GPT-5.2 demonstrates the most balanced profile, while other models exhibit notable weaknesses in adversarial robustness, multilingual generalization, or compliance. Critically, all models suffer substantial performance degradation under adversarial attacks.

Technology Category

Application Category

📝 Abstract

The rapid evolution of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has driven major gains in reasoning, perception, and generation across language and vision, yet whether these advances translate into comparable improvements in safety remains unclear, partly due to fragmented evaluations that focus on isolated modalities or threat models. In this report, we present an integrated safety evaluation of six frontier models--GPT-5.2, Gemini 3 Pro, Qwen3-VL, Grok 4.1 Fast, Nano Banana Pro, and Seedream 4.5--assessing each across language, vision-language, and image generation using a unified protocol that combines benchmark, adversarial, multilingual, and compliance evaluations. By aggregating results into safety leaderboards and model profiles, we reveal a highly uneven safety landscape: while GPT-5.2 demonstrates consistently strong and balanced performance, other models exhibit clear trade-offs across benchmark safety, adversarial robustness, multilingual generalization, and regulatory compliance. Despite strong results under standard benchmarks, all models remain highly vulnerable under adversarial testing, with worst-case safety rates dropping below 6%. Text-to-image models show slightly stronger alignment in regulated visual risk categories, yet remain fragile when faced with adversarial or semantically ambiguous prompts. Overall, these findings highlight that safety in frontier models is inherently multidimensional--shaped by modality, language, and evaluation design--underscoring the need for standardized, holistic safety assessments to better reflect real-world risk and guide responsible deployment.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Multimodal Safety

Adversarial Evaluation

Safety Benchmarking

Model Alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal safety evaluation

adversarial alignment

unified safety benchmarking