How to Detect and Defeat Molecular Mirage: A Metric-Driven Benchmark for Hallucination in LLM-based Molecular Comprehension

📅 2025-04-10

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Molecular large language models (LLMs) suffer from scientific unreliability in drug design due to hallucinations—particularly knowledge shortcuts—that compromise factual consistency with molecular science. Method: We introduce Mol-Hallu, the first hallucination-specific benchmark for molecular understanding, which quantifies logical consistency between generated descriptions and ground-truth molecular properties via scientific entailment modeling and free-text generation evaluation. We further propose HRPP, a fine-tuning-free, architecture-agnostic post-hoc hallucination mitigation mechanism. Results: Mol-Hallu demonstrates strong interpretability and discriminative power in hallucination assessment. HRPP reduces hallucination rates by an average of 37.2% across multiple molecular LLMs, significantly enhancing the reliability and trustworthiness of generated outputs in critical applications such as drug design.

Technology Category

Application Category

📝 Abstract

Large language models are increasingly used in scientific domains, especially for molecular understanding and analysis. However, existing models are affected by hallucination issues, resulting in errors in drug design and utilization. In this paper, we first analyze the sources of hallucination in LLMs for molecular comprehension tasks, specifically the knowledge shortcut phenomenon observed in the PubChem dataset. To evaluate hallucination in molecular comprehension tasks with computational efficiency, we introduce extbf{Mol-Hallu}, a novel free-form evaluation metric that quantifies the degree of hallucination based on the scientific entailment relationship between generated text and actual molecular properties. Utilizing the Mol-Hallu metric, we reassess and analyze the extent of hallucination in various LLMs performing molecular comprehension tasks. Furthermore, the Hallucination Reduction Post-processing stage~(HRPP) is proposed to alleviate molecular hallucinations, Experiments show the effectiveness of HRPP on decoder-only and encoder-decoder molecular LLMs. Our findings provide critical insights into mitigating hallucination and improving the reliability of LLMs in scientific applications.

Problem

Research questions and friction points this paper is trying to address.

Analyzing hallucination sources in LLMs for molecular comprehension

Introducing Mol-Hallu metric to quantify molecular hallucination

Proposing HRPP to reduce hallucinations in molecular LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Mol-Hallu metric for hallucination quantification

Analyzes hallucination sources in molecular comprehension tasks

Proposes HRPP to reduce molecular hallucinations effectively

🔎 Similar Papers

No similar papers found.