SciDaSynth: Interactive Structured Knowledge Extraction and Synthesis from Scientific Literature with Large Language Model

📅 2024-04-21
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Scientific literature is inherently multimodal, heterogeneous, and unstructured, posing significant challenges for existing knowledge extraction systems in achieving cross-document consistency and dynamic adaptation to user intent. To address this, we propose the first LLM-driven interactive knowledge structuring paradigm, integrating prompt engineering, structured output control, conversational state management, and multi-granularity visual exploration. This enables researchers to automatically generate structured tables via natural language queries while collaboratively verifying and iteratively refining outputs. Our approach overcomes key limitations of conventional automated systems: it maintains high accuracy and coverage while reducing manual correction effort by over 40%. Empirical evaluation demonstrates substantial improvements in the efficiency of constructing high-quality scientific knowledge bases, offering a novel paradigm for domain-specific knowledge graph construction and reproducible research.

Technology Category

Application Category

📝 Abstract
Extraction and synthesis of structured knowledge from extensive scientific literature are crucial for advancing and disseminating scientific progress. Although many existing systems facilitate literature review and digest, they struggle to process multimodal, varied, and inconsistent information within and across the literature into structured data. We introduce SciDaSynth, a novel interactive system powered by large language models (LLMs) that enables researchers to efficiently build structured knowledge bases from scientific literature at scale. The system automatically creates data tables to organize and summarize users' interested knowledge in literature via question-answering. Furthermore, it provides multi-level and multi-faceted exploration of the generated data tables, facilitating iterative validation, correction, and refinement. Our within-subjects study with researchers demonstrates the effectiveness and efficiency of SciDaSynth in constructing quality scientific knowledge bases. We further discuss the design implications for human-AI interaction tools for data extraction and structuring.
Problem

Research questions and friction points this paper is trying to address.

Extracting structured knowledge from multimodal scientific literature
Processing inconsistent information across diverse research papers
Building scalable interactive systems for literature-based synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-powered interactive system for knowledge extraction
Automated data table creation via question-answering
Multi-level exploration for iterative validation refinement
🔎 Similar Papers
No similar papers found.
X
Xingbo Wang
Weill Cornell Medicine, New York, USA
S
Samantha L. Huey
Cornell University, Ithaca, USA
R
Rui Sheng
Hong Kong University of Science and Technology, Hong Kong, China
S
Saurabh Mehta
Cornell University, Ithaca, USA
F
Fei Wang
Weill Cornell Medicine, New York, USA