"When Data is Scarce, Prompt Smarter"... Approaches to Grammatical Error Correction in Low-Resource Settings

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

To address the performance bottleneck in grammatical error correction (GEC) for Indo-Aryan and Dravidian languages—stemming from low-resource conditions and high morphological complexity—this paper proposes a lightweight, prompt-based cross-lingual GEC method. Our approach leverages efficient zero-shot and few-shot prompting strategies, harnessing the cross-lingual generalization capabilities of large language models (LLMs), including GPT-4.1, Gemini-2.5, and LLaMA-4, without any fine-tuning. The key contribution is the demonstration that carefully engineered prompts alone can activate robust grammatical sensitivity across multiple Indian languages, validating the strong zero-shot cross-lingual transfer capacity of modern LLMs. Evaluated on a benchmark shared task, our method achieves state-of-the-art results: first place in Tamil (GLEU 91.57) and Hindi (GLEU 85.69), and second place in Telugu—outperforming all existing approaches across all metrics and language pairs.

Technology Category

Application Category

📝 Abstract

Grammatical error correction (GEC) is an important task in Natural Language Processing that aims to automatically detect and correct grammatical mistakes in text. While recent advances in transformer-based models and large annotated datasets have greatly improved GEC performance for high-resource languages such as English, the progress has not extended equally. For most Indic languages, GEC remains a challenging task due to limited resources, linguistic diversity and complex morphology. In this work, we explore prompting-based approaches using state-of-the-art large language models (LLMs), such as GPT-4.1, Gemini-2.5 and LLaMA-4, combined with few-shot strategy to adapt them to low-resource settings. We observe that even basic prompting strategies, such as zero-shot and few-shot approaches, enable these LLMs to substantially outperform fine-tuned Indic-language models like Sarvam-22B, thereby illustrating the exceptional multilingual generalization capabilities of contemporary LLMs for GEC. Our experiments show that carefully designed prompts and lightweight adaptation significantly enhance correction quality across multiple Indic languages. We achieved leading results in the shared task--ranking 1st in Tamil (GLEU: 91.57) and Hindi (GLEU: 85.69), 2nd in Telugu (GLEU: 85.22), 4th in Bangla (GLEU: 92.86), and 5th in Malayalam (GLEU: 92.97). These findings highlight the effectiveness of prompt-driven NLP techniques and underscore the potential of large-scale LLMs to bridge resource gaps in multilingual GEC.

Problem

Research questions and friction points this paper is trying to address.

Addressing grammatical error correction for low-resource Indic languages

Overcoming limited data and linguistic diversity with prompting strategies

Enhancing multilingual GEC using LLMs and few-shot adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using large language models with few-shot prompting

Applying lightweight adaptation to enhance correction quality

Leveraging multilingual generalization for low-resource languages

🔎 Similar Papers

Prompting open-source and commercial language models for grammatical error correction of English learner text