🤖 AI Summary
Addressing the challenging task of author attribution in classical Persian poetry—characterized by archaic language, implicit stylistic features, and strict metrical constraints, which hinder existing computational approaches—this paper proposes the first multi-input neural framework integrating semantic, stylistic, and metrical cues. The model jointly fine-tunes a Transformer-based language encoder, 100-dimensional Word2Vec embeddings, seven quantitative stylistic metrics, and encodings of poetic form and rhyme class. A weighted voting mechanism coupled with confidence-threshold filtering (≥0.9) enables fine-grained author identification. Evaluated on a large-scale dataset comprising 67 poets and 647,000 verse lines, the framework achieves 71% overall verse-level accuracy, rising to 97% on high-confidence predictions. This work establishes the largest benchmark to date for Persian poetic authorship attribution and introduces a scalable, multimodal computational paradigm, advancing intelligent analysis of classical literary texts.
📝 Abstract
The intricate linguistic, stylistic, and metrical aspects of Persian classical poetry pose a challenge for computational authorship attribution. In this work, we present a versatile framework to determine authorship among 67 prominent poets. We employ a multi-input neural framework consisting of a transformer-based language encoder complemented by features addressing the semantic, stylometric, and metrical dimensions of Persian poetry. Our feature set encompasses 100-dimensional Word2Vec embeddings, seven stylometric measures, and categorical encodings of poetic form and meter. We compiled a vast corpus of 647,653 verses of the Ganjoor digital collection, validating the data through strict preprocessing and author verification while preserving poem-level splitting to prevent overlap. This work employs verse-level classification and majority and weighted voting schemes in evaluation, revealing that weighted voting yields 71% accuracy. We further investigate threshold-based decision filtering, allowing the model to generate highly confident predictions, achieving 97% accuracy at a 0.9 threshold, though at lower coverage. Our work focuses on the integration of deep representational forms with domain-specific features for improved authorship attribution. The results illustrate the potential of our approach for automated classification and the contribution to stylistic analysis, authorship disputes, and general computational literature research. This research will facilitate further research on multilingual author attribution, style shift, and generative modeling of Persian poetry.