🤖 AI Summary
Existing approaches struggle to uniformly track the evolution of ideas in multi-concept, multi-corpus settings and often rely heavily on explicit lexical cues, lacking fine-grained analysis of implicit concepts. This work proposes HistLens, a novel framework that integrates sparse autoencoders (SAEs), diachronic semantic modeling, and cross-corpus alignment techniques to decompose concepts into interpretable features and model their activation dynamics across time and corpora within a unified coordinate system. HistLens enables, for the first time, the construction of comparable evolutionary trajectories of implicit concepts across heterogeneous sources. Evaluated on long-term news data, the framework demonstrates strong empirical validity, accurately uncovering patterns of conceptual and cross-corpus ideological evolution.
📝 Abstract
Language change both reflects and shapes social processes, and the semantic evolution of foundational concepts provides a measurable trace of historical and social transformation. Despite recent advances in diachronic semantics and discourse analysis, existing computational approaches often (i) concentrate on a single concept or a single corpus, making findings difficult to compare across heterogeneous sources, and (ii) remain confined to surface lexical evidence, offering insufficient computational and interpretive granularity when concepts are expressed implicitly. We propose HistLens, a unified, SAE-based framework for multi-concept, multi-corpus conceptual-history analysis. The framework decomposes concept representations into interpretable features and tracks their activation dynamics over time and across sources, yielding comparable conceptual trajectories within a shared coordinate system. Experiments on long-span press corpora show that HistLens supports cross-concept, cross-corpus computation of patterns of idea evolution and enables implicit concept computation. By bridging conceptual modeling with interpretive needs, HistLens broadens the analytical perspectives and methodological repertoire available to social science and the humanities for diachronic text analysis.