What's In Your Field? Mapping Scientific Research with Knowledge Graphs and Large Language Models

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

The explosive growth of scientific literature poses significant challenges for interdisciplinary knowledge integration. Method: This paper proposes a lightweight, LLM-driven approach to structured knowledge construction, integrating large language models with a reusable scientific concept ontology—avoiding costly retrieval-augmented generation or opaque semantic modeling. Using only 20 annotated abstracts, the method achieves cross-domain generalization of scientific concepts; it introduces a lightweight, interdisciplinary-compatible ontology schema ensuring interpretability and extensibility; and constructs a domain-spanning knowledge graph covering astrophysics, fluid dynamics, and evolutionary biology, scaled to 30,000 arXiv papers. Contribution/Results: The resulting system enables precise literature question answering and scientific trend analysis. All components—including the ontology, annotation guidelines, and graph construction pipeline—are fully open-sourced to support reproducible, transparent scholarly analysis.

Technology Category

Application Category

📝 Abstract

The scientific literature's exponential growth makes it increasingly challenging to navigate and synthesize knowledge across disciplines. Large language models (LLMs) are powerful tools for understanding scientific text, but they fail to capture detailed relationships across large bodies of work. Unstructured approaches, like retrieval augmented generation, can sift through such corpora to recall relevant facts; however, when millions of facts influence the answer, unstructured approaches become cost prohibitive. Structured representations offer a natural complement -- enabling systematic analysis across the whole corpus. Recent work enhances LLMs with unstructured or semistructured representations of scientific concepts; to complement this, we try extracting structured representations using LLMs. By combining LLMs' semantic understanding with a schema of scientific concepts, we prototype a system that answers precise questions about the literature as a whole. Our schema applies across scientific fields and we extract concepts from it using only 20 manually annotated abstracts. To demonstrate the system, we extract concepts from 30,000 papers on arXiv spanning astrophysics, fluid dynamics, and evolutionary biology. The resulting database highlights emerging trends and, by visualizing the knowledge graph, offers new ways to explore the ever-growing landscape of scientific knowledge. Demo: abby101/surveyor-0 on HF Spaces. Code: https://github.com/chiral-carbon/kg-for-science.

Problem

Research questions and friction points this paper is trying to address.

Navigating and synthesizing knowledge across disciplines is challenging due to exponential growth of scientific literature.

Large language models lack detailed relationship capture across large scientific corpora.

Structured representations combined with LLMs enable systematic analysis and precise question answering across scientific fields.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLMs with structured scientific concept schemas

Extracts concepts using minimal manual annotations

Visualizes knowledge graphs to explore scientific trends

🔎 Similar Papers

Interesting Scientific Idea Generation using Knowledge Graphs and LLMs: Evaluations with 100 Research Group Leaders