Attribution in Scientific Literature: New Benchmark and Methods

📅 2024-05-03

📈 Citations: 6

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Scientific citation generation faces dual challenges: citation ambiguity and high hallucination rates in large language models (LLMs), severely undermining reliability in research applications. To address this, we introduce REASONS—a novel, fine-grained evaluation benchmark comprising sentence-level attribution annotations across 12 scientific disciplines. We propose a dual-scenario evaluation framework: *indirect querying* (sentence → paper title) and *direct querying* (sentence → author attribution). Methodologically, we innovate with context-aware metadata-enhanced retrieval to suppress hallucinations, integrating retrieval-augmented generation (RAG) with the Mistral model. Our approach reduces hallucination by 42% in indirect querying while matching the precision of top-tier models like GPT-4o. Empirical analysis uncovers a fundamental LLM limitation in aligning title and abstract semantics. REASONS establishes the first discipline-diverse, sentence-level citation evaluation standard and delivers a reproducible, hallucination-mitigated pipeline for trustworthy scientific AI.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) present a promising yet challenging frontier for automated source citation in scientific communication. Previous approaches to citation generation have been limited by citation ambiguity and LLM overgeneralization. We introduce REASONS, a novel dataset with sentence-level annotations across 12 scientific domains from arXiv. Our evaluation framework covers two key citation scenarios: indirect queries (matching sentences to paper titles) and direct queries (author attribution), both enhanced with contextual metadata. We conduct extensive experiments with models such as GPT-O1, GPT-4O, GPT-3.5, DeepSeek, and other smaller models like Perplexity AI (7B). While top-tier LLMs achieve high performance in sentence attribution, they struggle with high hallucination rates, a key metric for scientific reliability. Our metadata-augmented approach reduces hallucination rates across all tasks, offering a promising direction for improvement. Retrieval-augmented generation (RAG) with Mistral improves performance in indirect queries, reducing hallucination rates by 42% and maintaining competitive precision with larger models. However, adversarial testing highlights challenges in linking paper titles to abstracts, revealing fundamental limitations in current LLMs. REASONS provides a challenging benchmark for developing reliable and trustworthy LLMs in scientific applications

Problem

Research questions and friction points this paper is trying to address.

Addressing citation ambiguity and LLM overgeneralization in scientific attribution

Reducing hallucination rates in LLMs for reliable scientific citation

Improving indirect and direct query performance in citation generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

REASONS dataset with sentence-level annotations

Metadata-augmented approach reduces hallucination rates

Retrieval-augmented generation improves indirect queries

🔎 Similar Papers

No similar papers found.