🤖 AI Summary
Static word embeddings (e.g., Word2Vec) exhibit semantic instability in political texts due to short-term lexical meaning shifts, undermining their reliability for longitudinal analysis. Method: We systematically compare BERT and Word2Vec across two dimensions—long-term semantic stability and short-term contextual sensitivity—using a 20-year corpus of *People’s Daily* political texts. We propose a time-segmented semantic similarity framework and introduce a novel quantitative metric for word vector stability. Contribution/Results: Our empirical analysis demonstrates that BERT’s contextualized embeddings significantly improve semantic stability over Word2Vec while preserving fine-grained sensitivity to policy-related lexical evolution. This finding challenges the conventional trade-off between stability and contextuality, offering a more robust and interpretable representation foundation for political text analysis requiring both temporal consistency and dynamic adaptability.
📝 Abstract
Accurately interpreting words is vital in political science text analysis; some tasks require assuming semantic stability, while others aim to trace semantic shifts. Traditional static embeddings, like Word2Vec effectively capture long-term semantic changes but often lack stability in short-term contexts due to embedding fluctuations caused by unbalanced training data. BERT, which features transformer-based architecture and contextual embeddings, offers greater semantic consistency, making it suitable for analyses in which stability is crucial. This study compares Word2Vec and BERT using 20 years of People's Daily articles to evaluate their performance in semantic representations across different timeframes. The results indicate that BERT outperforms Word2Vec in maintaining semantic stability and still recognizes subtle semantic variations. These findings support BERT's use in text analysis tasks that require stability, where semantic changes are not assumed, offering a more reliable foundation than static alternatives.