Using a Human-AI Teaming Approach to Create and Curate Scientific Datasets with the SCILIRE System

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

The exponential growth of scientific literature has rendered manual extraction of structured knowledge increasingly challenging. To address this, this work proposes SCILIRE, a system that innovatively integrates human–machine collaboration deeply into an iterative knowledge extraction pipeline: researchers review and correct outputs generated by large language models, and their interactive feedback is immediately leveraged to refine subsequent inference. By synergistically combining large language models, an interactive feedback mechanism, and an iterative validation workflow, the approach significantly enhances the fidelity of information extraction across multiple scientific domains, thereby enabling efficient construction of high-quality scientific datasets.

Technology Category

Application Category

📝 Abstract

The rapid growth of scientific literature has made manual extraction of structured knowledge increasingly impractical. To address this challenge, we introduce SCILIRE, a system for creating datasets from scientific literature. SCILIRE has been designed around Human-AI teaming principles centred on workflows for verifying and curating data. It facilitates an iterative workflow in which researchers can review and correct AI outputs. Furthermore, this interaction is used as a feedback signal to improve future LLM-based inference. We evaluate our design using a combination of intrinsic benchmarking outcomes together with real-world case studies across multiple domains. The results demonstrate that SCILIRE improves extraction fidelity and facilitates efficient dataset creation.

Problem

Research questions and friction points this paper is trying to address.

scientific literature

structured knowledge extraction

dataset creation

human-AI teaming

LLM-based inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-AI teaming

scientific dataset curation

LLM-based inference