🤖 AI Summary
Generating classical Sanskrit poetry—strictly adhering to metrical constraints such as Anuṣṭubh—in low-resource, morphologically rich languages remains challenging due to scarce high-quality training data and inadequate modeling of prosodic structure.
Method: We introduce the first parallel corpus of Sanskrit metrical poetry; propose a metricality-aware constrained decoding strategy; and design a semantic-prosodic co-fine-tuning paradigm integrating Sanskrit morphological analysis, rule-based prosody modeling, and instruction tuning across multiple open-source and commercial LLMs.
Contribution/Results: Our decoding approach achieves >99% metrical compliance. Fine-tuned models significantly outperform baselines in semantic fidelity and stylistic appropriateness (p < 0.01, human evaluation). This work establishes a reusable data resource, methodology, and evaluation framework for structured literary generation in low-resource languages.
📝 Abstract
Recent advances in large language models (LLMs) have significantly improved natural language generation, including creative tasks like poetry composition. However, most progress remains concentrated in high-resource languages. This raises an important question: Can LLMs be adapted for structured poetic generation in a low-resource, morphologically rich language such as Sanskrit? In this work, we introduce a dataset designed for translating English prose into structured Sanskrit verse, with strict adherence to classical metrical patterns, particularly the Anushtub meter. We evaluate a range of generative models-both open-source and proprietary-under multiple settings. Specifically, we explore constrained decoding strategies and instruction-based fine-tuning tailored to metrical and semantic fidelity. Our decoding approach achieves over 99% accuracy in producing syntactically valid poetic forms, substantially outperforming general-purpose models in meter conformity. Meanwhile, instruction-tuned variants show improved alignment with source meaning and poetic style, as supported by human assessments, albeit with marginal trade-offs in metrical precision.