Estranged Predictions: Measuring Semantic Category Disruption with Masked Language Modelling

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how science fiction achieves cognitive estrangement through semantic permeability across ontological categories—human, animal, and machine. Method: We propose the first computationally operationalized framework of Suvin’s estrangement theory, introducing a tripartite semantic perturbation metric—retention rate, replacement rate, and entropy—to endow masked language models with literary interpretive capacity. Using RoBERTa to generate substitute tokens for masked referents and Gemini for classification, we conduct comparative analyses on science fiction versus general fiction corpora. Contribution/Results: Findings reveal significantly heightened cross-category semantic infiltration of machine concepts in science fiction, while human terms remain highly stable—functioning as semantic anchors—and thereby expose an implicit anthropocentric logic of ontological reconfiguration. This work constitutes a novel interdisciplinary integration of literary theory and computational linguistics, advancing both formal literary analysis and explainable AI for humanities research.

Technology Category

Application Category

📝 Abstract
This paper examines how science fiction destabilises ontological categories by measuring conceptual permeability across the terms human, animal, and machine using masked language modelling (MLM). Drawing on corpora of science fiction (Gollancz SF Masterworks) and general fiction (NovelTM), we operationalise Darko Suvin's theory of estrangement as computationally measurable deviation in token prediction, using RoBERTa to generate lexical substitutes for masked referents and classifying them via Gemini. We quantify conceptual slippage through three metrics: retention rate, replacement rate, and entropy, mapping the stability or disruption of category boundaries across genres. Our findings reveal that science fiction exhibits heightened conceptual permeability, particularly around machine referents, which show significant cross-category substitution and dispersion. Human terms, by contrast, maintain semantic coherence and often anchor substitutional hierarchies. These patterns suggest a genre-specific restructuring within anthropocentric logics. We argue that estrangement in science fiction operates as a controlled perturbation of semantic norms, detectable through probabilistic modelling, and that MLMs, when used critically, serve as interpretive instruments capable of surfacing genre-conditioned ontological assumptions. This study contributes to the methodological repertoire of computational literary studies and offers new insights into the linguistic infrastructure of science fiction.
Problem

Research questions and friction points this paper is trying to address.

Measuring semantic category disruption in science fiction using masked language modeling
Quantifying conceptual permeability across human, animal, and machine categories
Computationally operationalizing estrangement theory through token prediction deviations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using masked language modeling to measure estrangement
Classifying lexical substitutes via Gemini for analysis
Quantifying conceptual slippage with three novel metrics
Y
Yuxuan Liu
School of Arts, Queen Mary University of London, London, United Kingdom
Haim Dubossarsky
Haim Dubossarsky
Lecturer, Queen Mary University of London
Natural Language ProcessingComputational LinguisticsLanguage Change
R
Ruth Ahnert
School of Arts, Queen Mary University of London, London, United Kingdom