🤖 AI Summary
Scientific literature is inherently multimodal, heterogeneous, and unstructured, posing significant challenges for existing knowledge extraction systems in achieving cross-document consistency and dynamic adaptation to user intent. To address this, we propose the first LLM-driven interactive knowledge structuring paradigm, integrating prompt engineering, structured output control, conversational state management, and multi-granularity visual exploration. This enables researchers to automatically generate structured tables via natural language queries while collaboratively verifying and iteratively refining outputs. Our approach overcomes key limitations of conventional automated systems: it maintains high accuracy and coverage while reducing manual correction effort by over 40%. Empirical evaluation demonstrates substantial improvements in the efficiency of constructing high-quality scientific knowledge bases, offering a novel paradigm for domain-specific knowledge graph construction and reproducible research.
📝 Abstract
Extraction and synthesis of structured knowledge from extensive scientific literature are crucial for advancing and disseminating scientific progress. Although many existing systems facilitate literature review and digest, they struggle to process multimodal, varied, and inconsistent information within and across the literature into structured data. We introduce SciDaSynth, a novel interactive system powered by large language models (LLMs) that enables researchers to efficiently build structured knowledge bases from scientific literature at scale. The system automatically creates data tables to organize and summarize users' interested knowledge in literature via question-answering. Furthermore, it provides multi-level and multi-faceted exploration of the generated data tables, facilitating iterative validation, correction, and refinement. Our within-subjects study with researchers demonstrates the effectiveness and efficiency of SciDaSynth in constructing quality scientific knowledge bases. We further discuss the design implications for human-AI interaction tools for data extraction and structuring.