SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literature

📅 2024-06-10
🏛️ arXiv.org
📈 Citations: 17
Influential: 2
📄 PDF
🤖 AI Summary
To address the insufficient instruction-following capability of large language models (LLMs) amid the explosive growth of scientific literature, this paper introduces SciRIFF—the first multi-disciplinary, research-task-oriented instruction-tuning dataset—comprising 137K samples across 54 tasks. It targets five core scientific competencies: information extraction, summarization, question answering, claim verification, and classification, while supporting long-context processing and structured output generation. We propose a hybrid domain-specific data distillation method coupled with a specialist-general collaborative fine-tuning strategy, implemented atop the SciTulu architecture. This yields substantial performance gains on scientific tasks (+28.1% for the 7B model, +6.5% for the 70B model), while preserving over 98% of baseline general-purpose instruction-following capability. All data, models, and code are fully open-sourced, establishing the first research-oriented instruction-following benchmark.

Technology Category

Application Category

📝 Abstract
We present SciRIFF (Scientific Resource for Instruction-Following and Finetuning), a dataset of 137K instruction-following demonstrations for 54 tasks covering five essential scientific literature understanding capabilities: information extraction, summarization, question answering, claim verification, and classification. SciRIFF demonstrations are notable for their long input contexts, detailed task specifications, and complex structured outputs. While instruction-following resources are available in specific domains such as clinical medicine and chemistry, SciRIFF is the first dataset focused on extracting and synthesizing information from research literature across a wide range of scientific fields. To demonstrate the utility of SciRIFF, we develop a sample-efficient strategy to adapt a general instruction-following model for science by performing additional finetuning on a mix of general-domain and SciRIFF demonstrations. In evaluations on nine held-out scientific tasks, our model -- called SciTulu -- improves over a strong LLM baseline by 28.1% and 6.5% at the 7B and 70B scales respectively, while maintaining general instruction-following performance within 2% of the baseline. We are optimistic that SciRIFF will facilitate the development and evaluation of LLMs to help researchers navigate the ever-growing body of scientific literature. We release our dataset, model checkpoints, and data processing and evaluation code to enable further research.
Problem

Research questions and friction points this paper is trying to address.

Enhancing language model instruction-following for scientific literature analysis
Addressing complex information extraction from diverse scientific research papers
Improving synthesis and verification capabilities across scientific domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert-written dataset for scientific instruction-following
Finetuning LLMs with mixed general and scientific instructions
Structured outputs from complex instructions and long contexts
🔎 Similar Papers
No similar papers found.