The performances of the Chinese and U.S. Large Language Models on the Topic of Chinese Culture

๐Ÿ“… 2026-01-06
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study presents the first systematic comparison of mainstream large language models from China and the United States in their comprehension capabilities within Chinese cultural contexts, with a specific focus on their grasp of traditional Chinese content such as history, literature, and classical poetry. Employing a direct questioning paradigm, the evaluation assesses the performance of models including GPT-5.1, Gemini2.5Pro, DeepSeek-V3.2, and Qwen3-Max. The results demonstrate that Chinese-developed large models significantly outperform their American counterparts overall, among which Gemini2.5Pro and GPT-5.1 exhibit relatively stronger performance. The findings underscore the critical influence of training data distribution and localization strategies on modelsโ€™ cultural understanding, offering empirical evidence to inform cross-cultural alignment in artificial intelligence.

Technology Category

Application Category

๐Ÿ“ Abstract
Cultural backgrounds shape individuals'perspectives and approaches to problem-solving. Since the emergence of GPT-1 in 2018, large language models (LLMs) have undergone rapid development. To date, the world's ten leading LLM developers are primarily based in China and the United States. To examine whether LLMs released by Chinese and U.S. developers exhibit cultural differences in Chinese-language settings, we evaluate their performance on questions about Chinese culture. This study adopts a direct-questioning paradigm to evaluate models such as GPT-5.1, DeepSeek-V3.2, Qwen3-Max, and Gemini2.5Pro. We assess their understanding of traditional Chinese culture, including history, literature, poetry, and related domains. Comparative analyses between LLMs developed in China and the U.S. indicate that Chinese models generally outperform their U.S. counterparts on these tasks. Among U.S.-developed models, Gemini 2.5Pro and GPT-5.1 achieve relatively higher accuracy. The observed performance differences may potentially arise from variations in training data distribution, localization strategies, and the degree of emphasis on Chinese cultural content during model development.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Chinese Culture
Cultural Differences
Model Evaluation
Cross-cultural AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Cultural Bias
Chinese Culture
Model Evaluation
Cross-cultural Comparison
๐Ÿ”Ž Similar Papers
No similar papers found.
F
Feiyan Liu
School of Physics and Information Technology, Shaanxi Normal University, Xiโ€™an
C
Chenxun Zhuo
School of Foreign Language, Northwest University, Xiโ€™an
Siyan Zhao
Siyan Zhao
University of California Los Angeles
Large Language ModelsReinforcement LearningMachine Learning
B
Bao Ge
School of Physics and Information Technology, Shaanxi Normal University, Xiโ€™an
Tianming Liu
Tianming Liu
Distinguished Research Professor of Computer Science, University of Georgia
BrainBrain-Inspired AILLMArtificial General IntelligenceQuantum AI