🤖 AI Summary
Existing methods fail to model the coupled evolution of language and social structure within online subcultures (e.g., anti-feminist communities), hindering the generation of socially and temporally grounded intra-community lexicons. To address this, we propose the first unified framework that jointly models dynamic word embeddings and user embeddings, enabling quantitative characterization of term–subgroup–time triadic associations. Our approach integrates a dynamic graph neural network with temporal contrastive learning to co-optimize user–word representations. We construct the first linguist-validated, temporally annotated lexicon of the “men’s rights” ecosystem—comprising over one thousand evolving terms with time-varying weights—and a corresponding benchmark evaluation set. Experiments demonstrate significant improvements over state-of-the-art baselines across multiple subcommunities. Furthermore, our analysis uncovers causal patterns linking intra-group linguistic diffusion to identity polarization.
📝 Abstract
In-group language is an important signifier of group dynamics. This paper proposes a novel method for inducing lexicons of in-group language, which incorporates its socio-temporal context. Existing methods for lexicon induction do not capture the evolving nature of in-group language, nor the social structure of the community. Using dynamic word and user embeddings trained on conversations from online anti-women communities, our approach outperforms prior methods for lexicon induction. We develop a test set for the task of lexicon induction and a new lexicon of manosphere language, validated by human experts, which quantifies the relevance of each term to a specific sub-community at a given point in time. Finally, we present novel insights on in-group language which illustrate the utility of this approach.