Attribution in Scientific Literature: New Benchmark and Methods

πŸ“… 2024-05-03
πŸ“ˆ Citations: 6
✨ Influential: 0
πŸ“„ PDF

career value

205K/year
πŸ€– AI Summary
Scientific citation generation faces dual challenges: citation ambiguity and high hallucination rates in large language models (LLMs), severely undermining reliability in research applications. To address this, we introduce REASONSβ€”a novel, fine-grained evaluation benchmark comprising sentence-level attribution annotations across 12 scientific disciplines. We propose a dual-scenario evaluation framework: *indirect querying* (sentence β†’ paper title) and *direct querying* (sentence β†’ author attribution). Methodologically, we innovate with context-aware metadata-enhanced retrieval to suppress hallucinations, integrating retrieval-augmented generation (RAG) with the Mistral model. Our approach reduces hallucination by 42% in indirect querying while matching the precision of top-tier models like GPT-4o. Empirical analysis uncovers a fundamental LLM limitation in aligning title and abstract semantics. REASONS establishes the first discipline-diverse, sentence-level citation evaluation standard and delivers a reproducible, hallucination-mitigated pipeline for trustworthy scientific AI.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) present a promising yet challenging frontier for automated source citation in scientific communication. Previous approaches to citation generation have been limited by citation ambiguity and LLM overgeneralization. We introduce REASONS, a novel dataset with sentence-level annotations across 12 scientific domains from arXiv. Our evaluation framework covers two key citation scenarios: indirect queries (matching sentences to paper titles) and direct queries (author attribution), both enhanced with contextual metadata. We conduct extensive experiments with models such as GPT-O1, GPT-4O, GPT-3.5, DeepSeek, and other smaller models like Perplexity AI (7B). While top-tier LLMs achieve high performance in sentence attribution, they struggle with high hallucination rates, a key metric for scientific reliability. Our metadata-augmented approach reduces hallucination rates across all tasks, offering a promising direction for improvement. Retrieval-augmented generation (RAG) with Mistral improves performance in indirect queries, reducing hallucination rates by 42% and maintaining competitive precision with larger models. However, adversarial testing highlights challenges in linking paper titles to abstracts, revealing fundamental limitations in current LLMs. REASONS provides a challenging benchmark for developing reliable and trustworthy LLMs in scientific applications
Problem

Research questions and friction points this paper is trying to address.

Addressing citation ambiguity and LLM overgeneralization in scientific attribution
Reducing hallucination rates in LLMs for reliable scientific citation
Improving indirect and direct query performance in citation generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

REASONS dataset with sentence-level annotations
Metadata-augmented approach reduces hallucination rates
Retrieval-augmented generation improves indirect queries
πŸ”Ž Similar Papers
No similar papers found.