Mapping scientific communities at scale

📅 2025-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the dual challenges of scientific community structure analysis and emerging topic identification. Methodologically, it introduces the first end-to-end community mapping framework for large-scale scholarly corpora, integrating ScanR metadata, the OpenAlex citation network, and LLM-driven (Mistral Nemo) thematic annotation; cross-scale entity disambiguation is achieved via ORCID/ISNI/MeSH identifier harmonization, while strongly interactive filtering and multi-source fusion enable scalable research collaboration graph construction. Visualization—from national to laboratory level—is realized using Graphology (ForceAtlas2 + Louvain) and VOSviewer. Key contributions include: (1) a lightweight architecture supporting institutional-level embedded deployment; (2) a persistent-identifier–driven, high-precision disambiguation mechanism; and (3) real-time generation of high signal-to-noise-ratio collaboration graphs. The framework has been deployed in France’s national research network analytics platform, with its fully open-source toolchain serving over 100 research institutions.

Technology Category

Application Category

📝 Abstract
This study introduces a novel methodology for mapping scientific communities at scale, addressing challenges associated with network analysis in large bibliometric datasets. By leveraging enriched publication metadata from the French research portal scanR and applying advanced filtering techniques to prioritize the strongest interactions between entities, we construct detailed, scalable network maps. These maps are enhanced through systematic disambiguation of authors, affiliations, and topics using persistent identifiers and specialized algorithms. The proposed framework integrates Elasticsearch for efficient data aggregation, Graphology for network spatialization (Force Atltas2) and community detection (Louvain algorithm) and VOSviewer for network vizualization. A Large Language Model (Mistral Nemo) is used to label the communities detected and OpenAlex data helps to enrich the results with citation counts estimation to detect hot topics. This scalable approach enables insightful exploration of research collaborations and thematic structures, with potential applications for strategic decision-making in science policy and funding. These web tools are effective at the global (national) scale but are also available (and can be integrated via iframes) on the perimeter of any French research institution (from large research organisms to any laboratory). The scanR community analysis tool is available online [https://scanr.enseignementsup-recherche.gouv.fr/networks/get-started](https://scanr.enseignementsup-recherche.gouv.fr/networks/get-started). All tools and methodologies are open-source on the repo [https://github.com/dataesr/scanr-ui](https://github.com/dataesr/scanr-ui)
Problem

Research questions and friction points this paper is trying to address.

Scientific Community Analysis
Visualization of Research Patterns
Identification of Hot Topics
Innovation

Methods, ideas, or system contributions that make the work stand out.

scanR Platform
Scientific Collaboration Mapping
Data Visualization Tools
🔎 Similar Papers
No similar papers found.
V
Victor Barbier
National Institute for Research in Digital Science and Technology, INRIA, Paris, France
Eric Jeangirard
Eric Jeangirard
French Ministry of higher education, research and innovation
open scienceopen accessmachine learningnatural language processing