From Phonemes to Meaning: Evaluating Large Language Models on Tamil

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Low-resource, morphologically rich languages like Tamil lack native linguistic evaluation benchmarks, hindering reliable assessment of large language models (LLMs). Method: We introduce ILAKKANAM—the first linguistically grounded, culturally authentic Tamil evaluation benchmark—constructed from real Sri Lankan K–12 examination items. It covers five linguistic dimensions (morphology, syntax, semantics, pragmatics, and factual knowledge) via 820 expert-annotated, native-language questions organized within a grade-based difficulty framework to avoid cultural and linguistic distortions from English translation. Contribution/Results: Our systematic evaluation of leading closed- and open-weight LLMs reveals that Gemini 2.5 achieves highest accuracy; open-source models consistently underperform. Accuracy declines markedly with increasing grade level (i.e., rising linguistic complexity), and improvements in linguistic competence show no strong correlation with language identification capability. ILAKKANAM establishes a reproducible, culturally grounded paradigm for evaluating LLMs in low-resource languages.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have shown strong generalization across tasks in high-resource languages; however, their linguistic competence in low-resource and morphologically rich languages such as Tamil remains largely unexplored. Existing multilingual benchmarks often rely on translated English datasets, failing to capture the linguistic and cultural nuances of the target language. To address this gap, we introduce ILAKKANAM, the first Tamil-specific linguistic evaluation benchmark manually curated using 820 questions from Sri Lankan school-level Tamil subject examination papers. Each question is annotated by trained linguists under five linguistic categories and a factual knowledge category, spanning Grades 1--13 to ensure broad linguistic coverage. We evaluate both closed-source and open-source LLMs using a standardized evaluation framework. Our results show that Gemini 2.5 achieves the highest overall performance, while open-source models lag behind, highlighting the gap in linguistic grounding. Category- and grade-wise analyses reveal that all models perform well on lower-grade questions but show a clear decline as linguistic complexity increases. Further, no strong correlation is observed between a model's overall performance and its ability to identify linguistic categories, suggesting that performance may be driven by exposure rather than genuine understanding.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM linguistic competence in low-resource Tamil language

Addressing limitations of translated datasets for Tamil linguistic nuances

Assessing LLM performance across complexity levels and linguistic categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

First Tamil-specific linguistic evaluation benchmark

Manual curation using school examination questions

Standardized framework assessing six linguistic categories

🔎 Similar Papers

No similar papers found.