Do We Need Bigger Models for Science? Task-Aware Retrieval with Small Language Models

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the overreliance on extremely large language models in scientific knowledge discovery, which hinders reproducibility and accessibility. The authors propose a lightweight retrieval-augmented framework featuring a task-aware retrieval routing mechanism that dynamically selects appropriate strategies by integrating full-text content with structured metadata. Coupled with a small instruction-tuned language model, this approach generates citation-grounded responses. Experimental results demonstrate that the method substantially enhances the performance of small models across diverse tasks—including scholarly question answering, biomedical question answering, and text summarization—showcasing that well-designed retrieval mechanisms can effectively compensate for limited model capacity. The findings further reveal a complementary relationship between retrieval design and model scale, offering a novel paradigm for building efficient, reproducible academic AI assistants.

Technology Category

Application Category

📝 Abstract

Scientific knowledge discovery increasingly relies on large language models, yet many existing scholarly assistants depend on proprietary systems with tens or hundreds of billions of parameters. Such reliance limits reproducibility and accessibility for the research community. In this work, we ask a simple question: do we need bigger models for scientific applications? Specifically, we investigate to what extent carefully designed retrieval pipelines can compensate for reduced model scale in scientific applications. We design a lightweight retrieval-augmented framework that performs task-aware routing to select specialized retrieval strategies based on the input query. The system further integrates evidence from full-text scientific papers and structured scholarly metadata, and employs compact instruction-tuned language models to generate responses with citations. We evaluate the framework across several scholarly tasks, focusing on scholarly question answering (QA), including single- and multi-document scenarios, as well as biomedical QA under domain shift and scientific text compression. Our findings demonstrate that retrieval and model scale are complementary rather than interchangeable. While retrieval design can partially compensate for smaller models, model capacity remains important for complex reasoning tasks. This work highlights retrieval and task-aware design as key factors for building practical and reproducible scholarly assistants.

Problem

Research questions and friction points this paper is trying to address.

large language models

scientific applications

model scale

reproducibility

accessibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

task-aware retrieval

retrieval-augmented generation

small language models