🤖 AI Summary
Identifying seminal papers that actively generate “knowledge gaps”—topological voids between concepts in scientific knowledge networks—remains a critical challenge for understanding scientific innovation.
Method: We formalize knowledge gaps as drivers of innovation and construct a concept co-occurrence network from over 30 million papers in the Microsoft Academic Graph. For the first time, we apply computational topology—specifically persistent homology—to quantify and identify papers that initiate topological gaps.
Contribution/Results: Papers initiating such gaps exhibit significantly higher citation impact (ranking in the top 1%–20%) and elevated disruptiveness indices. Subsequent research frequently concentrates on filling these gaps, thereby steering field-level attention and evolutionary trajectories. This work establishes a computationally tractable paradigm for predicting breakthrough research and optimizing scientific resource allocation.
📝 Abstract
Knowledge production is often viewed as an endogenous process in which discovery arises through the recombination of existing theories, findings, and concepts. Yet given the vast space of potential recombinations, not all are equally valuable, and identifying those that may prove most generative remains challenging. We argue that a crucial form of recombination occurs when linking concepts creates knowledge gaps-empty regions in the conceptual landscape that focus scientific attention on proximal, unexplored connections and signal promising directions for future research. Using computational topology, we develop a method to systematically identify knowledge gaps in science at scale. Applying this approach to millions of articles from Microsoft Academic Graph (n = 34,363,623) over a 120-year period (1900-2020), we uncover papers that create topological gaps in concept networks, tracking how these gap-opening works reshape the scientific knowledge landscape. Our results indicate that gap-opening papers are more likely to rank among the most highly cited works (top 1-20%) compared with papers that do not introduce novel concept pairings. In contrast, papers that introduce novel combinations without opening gaps are not more likely to rank in the top 1% for citation counts, and are even less likely than baseline papers to appear in the top 5% to 20%. Our findings also suggest that gap-opening papers are more disruptive, highlighting their generative role in stimulating new directions for scientific inquiry.