Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets

📅 2024-06-08

🏛️ Annual Meeting of the Association for Computational Linguistics

📈 Citations: 2

✨ Influential: 1

career value

188K/year

🤖 AI Summary

This study evaluates GPT-4’s capability to extract structured information on-demand from materials science literature, specifically assessing its zero-shot fidelity in reproducing two manually curated materials datasets. Methodologically, we employ an interdisciplinary, domain-expert–driven error attribution framework—integrated with rigorous human annotation—to systematically diagnose model output deviations across dimensions including numerical accuracy, contextual disambiguation, and unit standardization. Results reveal significant fidelity bottlenecks in GPT-4’s scientific information extraction (IE), particularly in precision-critical tasks. Our key contribution is the first application of deep expert-led error analysis to scientific IE evaluation, enabling quantitative characterization of large language models’ reliability boundaries in authentic research settings. We further propose a scalable, expert-validated benchmark for scientific IE assessment, advancing methodological foundations for high-fidelity AI-assisted scientific discovery.

Technology Category

Application Category

📝 Abstract

We explore the ability of GPT-4 to perform ad-hoc schema based information extraction from scientific literature. We assess specifically whether it can, with a basic prompting approach, replicate two existing material science datasets, given the manuscripts from which they were originally manually extracted. We employ materials scientists to perform a detailed manual error analysis to assess where the model struggles to faithfully extract the desired information, and draw on their insights to suggest research directions to address this broadly important task.

Problem

Research questions and friction points this paper is trying to address.

Assessing GPT-4's ability for schema-based scientific information extraction

Evaluating replication of material science datasets via basic prompting

Identifying model limitations through manual error analysis by experts

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-4 for ad-hoc schema extraction

Basic prompting replicates material datasets

Manual error analysis guides improvements

🔎 Similar Papers

No similar papers found.