Extracting Information About Publication Venues Using Citation-Informed Transformers

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates semantic similarity among computer science academic venues (conferences/journals) and its temporal evolution. To address this, we propose a novel, embedding-based framework for quantifying venue similarity through statistical analysis of document-level SPECTER embeddings—derived from 60,000 papers published across 2015–2023—complemented by hypothesis testing, longitudinal similarity tracking, and t-SNE/UMAP visualization. Our method reveals significant semantic indistinguishability among top-tier conferences (e.g., NeurIPS and ICML), and demonstrates statistically significant convergence (p < 0.01) in the embedding distributions of NLP venues (e.g., ACL, EMNLP) over the past five years. These findings uncover a broader trend of cross-venue semantic convergence within the academic ecosystem. The work establishes an interpretable, reproducible, embedding-driven paradigm for analyzing disciplinary evolution, offering a scalable approach to mapping structural shifts in scholarly communication.

Technology Category

Application Category

📝 Abstract
Scientific document embeddings contain a variety of rich features which can be harnessed for downstream tasks such as recommendation, ranking, and clustering. We explore which tangible insights can be drawn from scientific document embeddings to understand trends in computer science research featured across nine well-known venues. We collect approximately 60,000 scientific documents published between 2015 and 2023 and analyze their embeddings, which we produce with the SPECTER pre-trained language model. In particular, we examine whether similarity between two venues can be measured using the embeddings of the scientific documents they admit for publication. Our findings indicate that some venues within computer science are indistinguishable when only considering the distributions of their document embeddings. We additionally examine whether any two venues are becoming increasingly similar over time and identify a trend of convergence within some venues in our analysis. We discuss the implications of these results and the potential impact on new scientific contributions.
Problem

Research questions and friction points this paper is trying to address.

Measure similarity between venues using document embeddings
Analyze trends in computer science research venues
Examine convergence of venue similarities over time
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses SPECTER model for document embeddings
Measures venue similarity via embeddings
Analyzes trends in computer science venues
🔎 Similar Papers
No similar papers found.
B
Brian D. Zimmerman
University of Waterloo
J
Joshua Folkins
University of Waterloo
Olga Vechtomova
Olga Vechtomova
University of Waterloo
Natural Language ProcessingAI and Creativity