Using a Human-AI Teaming Approach to Create and Curate Scientific Datasets with the SCILIRE System

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The exponential growth of scientific literature has rendered manual extraction of structured knowledge increasingly challenging. To address this, this work proposes SCILIRE, a system that innovatively integrates human–machine collaboration deeply into an iterative knowledge extraction pipeline: researchers review and correct outputs generated by large language models, and their interactive feedback is immediately leveraged to refine subsequent inference. By synergistically combining large language models, an interactive feedback mechanism, and an iterative validation workflow, the approach significantly enhances the fidelity of information extraction across multiple scientific domains, thereby enabling efficient construction of high-quality scientific datasets.

Technology Category

Application Category

📝 Abstract
The rapid growth of scientific literature has made manual extraction of structured knowledge increasingly impractical. To address this challenge, we introduce SCILIRE, a system for creating datasets from scientific literature. SCILIRE has been designed around Human-AI teaming principles centred on workflows for verifying and curating data. It facilitates an iterative workflow in which researchers can review and correct AI outputs. Furthermore, this interaction is used as a feedback signal to improve future LLM-based inference. We evaluate our design using a combination of intrinsic benchmarking outcomes together with real-world case studies across multiple domains. The results demonstrate that SCILIRE improves extraction fidelity and facilitates efficient dataset creation.
Problem

Research questions and friction points this paper is trying to address.

scientific literature
structured knowledge extraction
dataset creation
human-AI teaming
LLM-based inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-AI teaming
scientific dataset curation
LLM-based inference
iterative feedback
knowledge extraction
🔎 Similar Papers
No similar papers found.
N
Necva Bölücü
CSIRO, Sydney, Australia
J
Jessica Irons
CSIRO, Sydney, Australia
Changhyun Lee
Changhyun Lee
Professor of Radiology, Seoul National University, College of Medicine
Radiologythoracic
B
Brian Jin
CSIRO, Sydney, Australia
M
Maciej Rybinski
ITIS, University of Málaga, Málaga, Spain
H
Huichen Yang
CSIRO, Sydney, Australia
Andreas Duenser
Andreas Duenser
CSIRO - Data61
Human Factorstrusthuman-AI collaborationcollaborative intelligencehuman-AI workflows
Stephen Wan
Stephen Wan
Data61 CSIRO
computational linguistics