OpenHuEval: Evaluating Large Language Model on Hungarian Specifics

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing evaluations of large language models (LLMs) lack rigorous assessment of their performance in Hungarian linguistic and cultural contexts. Method: We introduce HuEval—the first Hungarian-specific benchmark—comprising 3,953 real-user queries across eight language-and-culture dimensions and five task categories. We propose an LLM-as-judge automated evaluation framework, integrating diverse Hungarian corpora, generative-capability-oriented task design, real-query sampling, fine-grained multidimensional metrics, and chain-of-thought analysis using large reasoning models. Contribution/Results: Comprehensive evaluation of mainstream LLMs demonstrates the necessity of Hungarian-specific adaptation. All benchmark data, code, and analytical frameworks are fully open-sourced, significantly enhancing the scientific rigor, precision, and interpretability of non-English LLM evaluation.

Technology Category

Application Category

📝 Abstract
We introduce OpenHuEval, the first benchmark for LLMs focusing on the Hungarian language and specifics. OpenHuEval is constructed from a vast collection of Hungarian-specific materials sourced from multiple origins. In the construction, we incorporated the latest design principles for evaluating LLMs, such as using real user queries from the internet, emphasizing the assessment of LLMs' generative capabilities, and employing LLM-as-judge to enhance the multidimensionality and accuracy of evaluations. Ultimately, OpenHuEval encompasses eight Hungarian-specific dimensions, featuring five tasks and 3953 questions. Consequently, OpenHuEval provides the comprehensive, in-depth, and scientifically accurate assessment of LLM performance in the context of the Hungarian language and its specifics. We evaluated current mainstream LLMs, including both traditional LLMs and recently developed Large Reasoning Models. The results demonstrate the significant necessity for evaluation and model optimization tailored to the Hungarian language and specifics. We also established the framework for analyzing the thinking processes of LRMs with OpenHuEval, revealing intrinsic patterns and mechanisms of these models in non-English languages, with Hungarian serving as a representative example. We will release OpenHuEval at https://github.com/opendatalab/OpenHuEval .
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs on Hungarian language specifics
Assessing generative capabilities using real user queries
Analyzing thinking processes of LRMs in non-English languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

First Hungarian-specific LLM benchmark
Uses real user queries for evaluation
Employs LLM-as-judge for accuracy
🔎 Similar Papers
No similar papers found.
Haote Yang
Haote Yang
PJLab
CVLLMMLLMAI4S
Xingjian Wei
Xingjian Wei
shanghai AI lab
data-centric-aiLLMVLMEngineer
J
Jiang Wu
Shanghai Artificial Intelligence Laboratory
N
No'emi Ligeti-Nagy
HUN-REN Hungarian Research Centre for Linguistics
J
Jiaxing Sun
Wuhan University
Yinfan Wang
Yinfan Wang
Engineer, PJLAB
Z
Zijian GyHozHo Yang
HUN-REN Hungarian Research Centre for Linguistics
J
Junyuan Gao
University of Chinese Academy of Sciences
Jingchao Wang
Jingchao Wang
East China Normal University
AI
Bowen Jiang
Bowen Jiang
University of Pennsylvania, Microsoft Corporation
Artificial IntelligencePost-trainingPersonalizationMultimodality
S
Shasha Wang
Shanghai Artificial Intelligence Laboratory
N
Nanjun Yu
East China Normal University
Zihao Zhang
Zihao Zhang
天津大学
计算机视觉
S
Shixin Hong
Tsinghua University
H
Hongwei Liu
Shanghai Artificial Intelligence Laboratory
W
Wei Li
Shanghai Artificial Intelligence Laboratory
S
Songyang Zhang
Shanghai Artificial Intelligence Laboratory
Dahua Lin
Dahua Lin
The Chinese University of Hong Kong
computer visionmachine learningprobabilistic inferencebayesian nonparametrics
Lijun Wu
Lijun Wu
Shanghai AI Laboratory
MLLLMAI4Science
G
G'abor Pr'osz'eky
HUN-REN Hungarian Research Centre for Linguistics
Conghui He
Conghui He
Shanghai AI Laboratory
Data-centric AILLMDocument Intelligence