π€ AI Summary
This study addresses the early detection of emerging concepts in large-scale textual corpora. We propose a novel dynamic detection method based on temporal evolution of embedding-space heatmaps. Unlike conventional approaches relying on semantic drift or lexical frequency statistics, our method projects high-dimensional word embeddings into interpretable semantic heatmaps and explicitly models their spatiotemporal distributional shifts across time periods, enabling fine-grained and high-precision identification of conceptual emergence. Evaluations on U.S. Senate speech transcripts (1941β2015) demonstrate statistically significant improvements over state-of-the-art baselines. Further analysis reveals that minority-party senators exhibit higher propensity to introduce novel concepts, and concept emergence exhibits statistically significant associations with legislatorsβ racial, ethnic, and gender identities. The source code and trained models are publicly available.
π Abstract
We introduce a new method to identify emerging concepts in large text corpora. By analyzing changes in the heatmaps of the underlying embedding space, we are able to detect these concepts with high accuracy shortly after they originate, in turn outperforming common alternatives. We further demonstrate the utility of our approach by analyzing speeches in the U.S. Senate from 1941 to 2015. Our results suggest that the minority party is more active in introducing new concepts into the Senate discourse. We also identify specific concepts that closely correlate with the Senators' racial, ethnic, and gender identities. An implementation of our method is publicly available.