🤖 AI Summary
This study addresses author identification by proposing a deep learning approach that models part-of-speech (POS) sequences. To overcome limitations of conventional lexical features—namely, topic sensitivity—and the insufficiency of isolated POS distributions, we construct a POS bigram frequency matrix and jointly employ fully connected and convolutional neural networks for stylistic representation learning. We provide the first empirical evidence that POS bigrams capture author-specific syntactic habits more effectively than unigram POS distributions. Furthermore, multidimensional scaling (MDS) is applied to visualize the high-dimensional stylistic space, revealing distinct author clusters. Experiments demonstrate that our bigram-based model significantly improves classification accuracy, validating the discriminative power of syntactic sequence patterns in stylometry. The approach yields an interpretable and robust framework for computational stylistics.
📝 Abstract
Deep learning methods have been increasingly applied to computational linguistics to uncover patterns in text data. This study investigates author-specific word class distributions using part-of-speech (POS) tagging and bigram analysis. By leveraging deep neural networks, we classify literary authors based on POS tag vectors and bigram frequency matrices derived from their works. We employ fully connected and convolutional neural network architectures to explore the efficacy of unigram and bigram-based representations. Our results demonstrate that while unigram features achieve moderate classification accuracy, bigram-based models significantly improve performance, suggesting that sequential word class patterns are more distinctive of authorial style. Multi-dimensional scaling (MDS) visualizations reveal meaningful clustering of authors' works, supporting the hypothesis that stylistic nuances can be captured through computational methods. These findings highlight the potential of deep learning and linguistic feature analysis for author profiling and literary studies.