Use of diverse data sources to control which topics emerge in a science map

📅 2024-12-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional science mapping methods rely on document clustering, which is vulnerable to inherent biases in data sources, resulting in imbalanced thematic representation. This paper proposes a novel paradigm for constructing document networks from heterogeneous, multimodal data sources—including patents, policy documents, social media posts, and author metadata—to systematically mitigate thematic clustering bias and enable demand-driven, customizable science mapping. We provide the first empirical validation of non-traditional data sources’ capacity to controllably steer thematic orientation in science maps, identifying source-specific biases (e.g., Facebook’s preference for health-related topics; author affiliations reinforcing geographic entities). Evaluated across six domains—health, biotechnology, policy, food, nursing, and geography—the approach significantly improves thematic distribution balance, enhancing map utility, interpretability, and domain adaptability.

Technology Category

Application Category

📝 Abstract
Traditional science maps visualize topics by clustering documents, but they are inherently biased toward clustering certain topics over others. If these topics could be chosen, then the science maps could be tailored for different needs. In this paper, we explore the use of document networks from diverse data sources as a tool to control the topic clustering bias of a science map. We analyze this by evaluating the clustering effectiveness of several topic categories over two traditional and six non-traditional data sources. We found that the topics favored in each non-traditional data source are about: Health for Facebook users, biotechnology for patent families, government and social issues for policy documents, food for Twitter conversations, nursing for Twitter users, and geographical entities for document authors (the favoring in this latter source was particularly strong). Our results show that diverse data sources can be used to control topic bias, which opens up the possibility of creating science maps tailored for different needs.
Problem

Research questions and friction points this paper is trying to address.

Control topic bias in science maps using diverse data sources
Evaluate clustering effectiveness across traditional and non-traditional data sources
Tailor science maps for specific needs by selecting favored topics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Use diverse data sources to control topic bias
Evaluate clustering effectiveness across multiple sources
Tailor science maps for specific topic needs
🔎 Similar Papers
No similar papers found.