🤖 AI Summary
Existing studies on scientific term evolution overlook implicit semantic reconstruction—where nominal identity persists while conceptual content shifts, as exemplified by the “Ship of Theseus” paradox.
Method: We introduce the novel concept of the “Language Model Ship” and construct a specialized corpus from ten years of top-tier NLP conference papers. Integrating quantitative text analysis, term co-occurrence modeling, and diachronic semantic tracking, we systematically trace how “language model” evolves.
Contribution/Results: Empirical analysis reveals three distinct referential shifts over the decade—RNN-based → Transformer-based → LLM-based—tightly coupled with dominant architectural advances. This demonstrates that semantic drift is not noise but a core mechanism of scientific progress, wherein theoretical conceptualization and system implementation co-constitute discourse evolution. The study establishes implicit semantic reconstruction as a fundamental driver of scientific language change.
📝 Abstract
The term Language Models (LMs) as a time-specific collection of models of interest is constantly reinvented, with its referents updated much like the $ extit{Ship of Theseus}$ replaces its parts but remains the same ship in essence. In this paper, we investigate this $ extit{Ship of Language Models}$ problem, wherein scientific evolution takes the form of continuous, implicit retrofits of key existing terms. We seek to initiate a novel perspective of scientific progress, in addition to the more well-studied emergence of new terms. To this end, we construct the data infrastructure based on recent NLP publications. Then, we perform a series of text-based analyses toward a detailed, quantitative understanding of the use of Language Models as a term of art. Our work highlights how systems and theories influence each other in scientific discourse, and we call for attention to the transformation of this Ship that we all are contributing to.