PARSI: Persian Authorship Recognition via Stylometric Integration

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenging task of author attribution in classical Persian poetry—characterized by archaic language, implicit stylistic features, and strict metrical constraints, which hinder existing computational approaches—this paper proposes the first multi-input neural framework integrating semantic, stylistic, and metrical cues. The model jointly fine-tunes a Transformer-based language encoder, 100-dimensional Word2Vec embeddings, seven quantitative stylistic metrics, and encodings of poetic form and rhyme class. A weighted voting mechanism coupled with confidence-threshold filtering (≥0.9) enables fine-grained author identification. Evaluated on a large-scale dataset comprising 67 poets and 647,000 verse lines, the framework achieves 71% overall verse-level accuracy, rising to 97% on high-confidence predictions. This work establishes the largest benchmark to date for Persian poetic authorship attribution and introduces a scalable, multimodal computational paradigm, advancing intelligent analysis of classical literary texts.

Technology Category

Application Category

📝 Abstract
The intricate linguistic, stylistic, and metrical aspects of Persian classical poetry pose a challenge for computational authorship attribution. In this work, we present a versatile framework to determine authorship among 67 prominent poets. We employ a multi-input neural framework consisting of a transformer-based language encoder complemented by features addressing the semantic, stylometric, and metrical dimensions of Persian poetry. Our feature set encompasses 100-dimensional Word2Vec embeddings, seven stylometric measures, and categorical encodings of poetic form and meter. We compiled a vast corpus of 647,653 verses of the Ganjoor digital collection, validating the data through strict preprocessing and author verification while preserving poem-level splitting to prevent overlap. This work employs verse-level classification and majority and weighted voting schemes in evaluation, revealing that weighted voting yields 71% accuracy. We further investigate threshold-based decision filtering, allowing the model to generate highly confident predictions, achieving 97% accuracy at a 0.9 threshold, though at lower coverage. Our work focuses on the integration of deep representational forms with domain-specific features for improved authorship attribution. The results illustrate the potential of our approach for automated classification and the contribution to stylistic analysis, authorship disputes, and general computational literature research. This research will facilitate further research on multilingual author attribution, style shift, and generative modeling of Persian poetry.
Problem

Research questions and friction points this paper is trying to address.

Persian poetry authorship recognition using stylometric features
Multi-input neural framework for 67 poets' classification
Integrating deep learning with domain-specific poetic attributes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-input neural framework with transformer encoder
Integrates Word2Vec, stylometric, and metrical features
Threshold-based filtering for high-confidence predictions
🔎 Similar Papers
No similar papers found.