Comparative Study of Zero-Shot Cross-Lingual Transfer for Bodo POS and NER Tagging Using Gemini 2.0 Flash Thinking Experimental Model

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates zero-shot cross-lingual transfer for part-of-speech (POS) tagging and named entity recognition (NER) in the low-resource language Bodo. We conduct the first empirical evaluation of Gemini 2.0 Flash Thinking for zero-shot POS/NER transfer to Bodo, systematically comparing two prompting strategies: (1) direct translation followed by monolingual annotation, and (2) parallel-sentence–based prompt engineering. Results demonstrate that the parallel-prompting approach significantly outperforms direct translation—particularly in NER, where F1 scores improve markedly—highlighting syntactic divergence and translation fidelity as primary bottlenecks. We propose and validate a prompt-driven cross-lingual annotation paradigm tailored to low-resource languages (LRLs), and establish the first empirically grounded Bodo benchmark for POS and NER. This work delivers a reproducible methodological framework and practical guidelines for NLP resource development in LRLs.

Technology Category

Application Category

📝 Abstract
Named Entity Recognition (NER) and Part-of-Speech (POS) tagging are critical tasks for Natural Language Processing (NLP), yet their availability for low-resource languages (LRLs) like Bodo remains limited. This article presents a comparative empirical study investigating the effectiveness of Google's Gemini 2.0 Flash Thinking Experiment model for zero-shot cross-lingual transfer of POS and NER tagging to Bodo. We explore two distinct methodologies: (1) direct translation of English sentences to Bodo followed by tag transfer, and (2) prompt-based tag transfer on parallel English-Bodo sentence pairs. Both methods leverage the machine translation and cross-lingual understanding capabilities of Gemini 2.0 Flash Thinking Experiment to project English POS and NER annotations onto Bodo text in CONLL-2003 format. Our findings reveal the capabilities and limitations of each approach, demonstrating that while both methods show promise for bootstrapping Bodo NLP, prompt-based transfer exhibits superior performance, particularly for NER. We provide a detailed analysis of the results, highlighting the impact of translation quality, grammatical divergences, and the inherent challenges of zero-shot cross-lingual transfer. The article concludes by discussing future research directions, emphasizing the need for hybrid approaches, few-shot fine-tuning, and the development of dedicated Bodo NLP resources to achieve high-accuracy POS and NER tagging for this low-resource language.
Problem

Research questions and friction points this paper is trying to address.

Evaluating zero-shot cross-lingual transfer for Bodo POS and NER tagging.
Comparing direct translation and prompt-based methods using Gemini 2.0.
Assessing challenges in low-resource language NLP for Bodo.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot cross-lingual transfer using Gemini 2.0
Direct translation and prompt-based tag transfer
Analysis of translation quality and grammatical divergences
🔎 Similar Papers
No similar papers found.
Sanjib Narzary
Sanjib Narzary
Assistant Professor, Computer Science & Engineering, Central Institute of Technology Kokrajhar
Machine TranslationNeural Machine TranslationComputational LinguisticsNatural Language ProcessingDeep Learning
B
Bihung Brahma
Central Institute of Technology Kokrajhar, JD Road, Kokrajhar, 783370, Assam, India
H
Haradip Mahilary
Central Institute of Technology Kokrajhar, JD Road, Kokrajhar, 783370, Assam, India
M
Mahananda Brahma
Central Institute of Technology Kokrajhar, JD Road, Kokrajhar, 783370, Assam, India
B
Bidisha Som
Centre for Linguistic Science and Technology, IIT Guwahati, North Guwahati, Guwahati, 781039, Assam, India
Sukumar Nandi
Sukumar Nandi
Professor Indian Institute of Technology Guwahati
Computer NetworksInformation Security