Representing the Disciplinary Structure of Physics: A Comparative Evaluation of Graph and Text Embedding Methods

📅 2023-08-30
🏛️ Quantitative Science Studies
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the effectiveness of graph and text embeddings in reconstructing the hierarchical disciplinary structure of physics, as defined by the Physics and Astronomy Classification Scheme (PACS). Leveraging the APS citation network (graph structure) and full-text content (textual data), we systematically compare node2vec, residual2vec, Doc2Vec, and BERT embeddings. Hierarchical clustering and tree-matching evaluation quantify how well each embedding recovers the ground-truth PACS hierarchy. To our knowledge, this is the first quantitative, domain-specific comparison of graph versus text embeddings for modeling an authoritative scientific classification system. Results show that graph embeddings—particularly residual2vec—substantially outperform text embeddings and conventional methods: top-1 tree-matching accuracy improves by 12.7%, indicating that citation relationships better reflect disciplinary boundaries than lexical semantics. Moreover, neural embeddings consistently surpass non-neural baselines, and graph-based approaches demonstrate superior robustness across evaluation metrics.
📝 Abstract
Recent advances in machine learning o_er new ways to represent and study scholarly works and the space of knowledge. Graph and text embeddings provide a convenient vector representation of scholarly works based on citations and text. Yet, it is unclear whether their representations are consistent or provide different views of the structure of science. Here, we compare graph and text embedding by testing their ability to capture the hierarchical structure of the Physics and Astronomy Classification Scheme (PACS) of papers published by the American Physical Society (APS). We also provide a qualitative comparison of the overall structure of the graph and text embeddings for reference. We find that neural network-based methods outperform traditional methods, and graph embedding methods node2vec and residual2vec are better than other methods at capturing the PACS structure. Our results call for further investigations into how different contexts of scientific papers are captured by different methods, and how we can combine and leverage such information in an interpretable manner. https://www.webofscience.com/api/gateway/wos/peer-review/10.1162/qss_a_00349
Problem

Research questions and friction points this paper is trying to address.

Compare graph and text embedding methods.
Evaluate PACS structure representation accuracy.
Assess neural network methods' performance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph embedding captures PACS structure
Text embedding represents scholarly works
Neural networks outperform traditional methods
🔎 Similar Papers
No similar papers found.