Exploring the Capability Boundaries of LLMs in Mastering of Chinese Chouxiang Language

📅 2026-04-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
This study addresses the challenge posed by “abstract language”—a highly dynamic and context-dependent form of Chinese internet subcultural discourse—to the comprehension and generation capabilities of large language models (LLMs). To systematically evaluate mainstream LLMs, the authors introduce Mouse, the first multi-task benchmark specifically designed for Chinese abstract language, encompassing six understanding and generation tasks. Evaluation combines LLM-as-a-judge scoring, human assessment, and error attribution analysis. Results reveal that current state-of-the-art models perform adequately only in contextual semantic understanding, while significantly underperforming on other tasks. The study also uncovers a notable misalignment between model judgments and human values. The code and dataset are publicly released to foster further research in this domain.

Technology Category

Application Category

📝 Abstract
While large language models (LLMs) have achieved remarkable success in general language tasks, their performance on Chouxiang Language, a representative subcultural language in the Chinese internet context, remains largely unexplored. In this paper, we introduce Mouse, a specialized benchmark designed to evaluate the capabilities of LLMs on NLP tasks involving Chouxiang Language across six tasks. Experimental results show that, current state-of-the-art (SOTA) LLMs exhibit clear limitations on multiple tasks, while performing well on tasks that involve contextual semantic understanding. In addition, we further discuss the reasons behind the generally low performance of SOTA LLMs on Chouxiang Language, examine whether the LLM-as-a-judge approach adopted for translation tasks aligns with human judgments and values, and analyze the key factors that influence Chouxiang translation. Our study aims to promote further research in the NLP community on multicultural integration and the dynamics of evolving internet languages. Our code and data are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Chouxiang Language
large language models
subcultural language
Chinese internet language
NLP benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chouxiang Language
Large Language Models
Benchmarking
Subcultural Language
LLM-as-a-judge
🔎 Similar Papers
No similar papers found.