🤖 AI Summary
The exponential growth of scientific literature has rendered manual extraction of structured knowledge increasingly challenging. To address this, this work proposes SCILIRE, a system that innovatively integrates human–machine collaboration deeply into an iterative knowledge extraction pipeline: researchers review and correct outputs generated by large language models, and their interactive feedback is immediately leveraged to refine subsequent inference. By synergistically combining large language models, an interactive feedback mechanism, and an iterative validation workflow, the approach significantly enhances the fidelity of information extraction across multiple scientific domains, thereby enabling efficient construction of high-quality scientific datasets.
📝 Abstract
The rapid growth of scientific literature has made manual extraction of structured knowledge increasingly impractical. To address this challenge, we introduce SCILIRE, a system for creating datasets from scientific literature. SCILIRE has been designed around Human-AI teaming principles centred on workflows for verifying and curating data. It facilitates an iterative workflow in which researchers can review and correct AI outputs. Furthermore, this interaction is used as a feedback signal to improve future LLM-based inference. We evaluate our design using a combination of intrinsic benchmarking outcomes together with real-world case studies across multiple domains. The results demonstrate that SCILIRE improves extraction fidelity and facilitates efficient dataset creation.