🤖 AI Summary
This work addresses the dual challenge of identifying temporal breakpoints and disentangling authorial stylistic traits in historical texts, using the State of the Union addresses of 42 U.S. presidents—a small-scale, highly heterogeneous corpus. We propose the first framework integrating temporal discontinuity modeling with author-style analysis. Methodologically, it combines linguistic dynamical modeling, cross-term stylistic representation learning, and a lightweight temporal classifier to jointly infer fine-grained writing-time attribution and author identity. Our key contribution lies in overcoming the twin bottlenecks of term-level temporal attribution and author identification under data-scarce historical conditions. Experiments demonstrate 95% author attribution accuracy and precise localization of document creation to a single presidential term—substantially outperforming existing baselines.
📝 Abstract
In this technical note we suggest a novel approach to discover temporal (related and unrelated to language dilation) and personality (authorship attribution) aspects in historical datasets. We exemplify our approach on the State of the Union addresses given by the past 42 US presidents: this dataset is known for its relatively small amount of data, and high variability of the size and style of texts. Nevertheless, we manage to achieve about 95% accuracy on the authorship attribution task, and pin down the date of writing to a single presidential term.