Scientific Knowledge-driven Decoding Constraints Improving the Reliability of LLMs

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the limited reliability of large language models (LLMs) in specialized domains due to hallucinations and their inability to effectively leverage highly condensed scientific theories and rules. To overcome this, we propose SciDC, a novel method that automatically distills domain-specific scientific knowledge into multi-level, standardized constraint rules and integrates them into LLM decoding through an extensible framework, imposing structured, hard constraints on generated content. By combining LLM-based automatic rule extraction, hierarchical knowledge representation, and constrained sequence generation, SciDC significantly mitigates hallucinations and enhances domain-specific reasoning. Evaluated on tasks including industrial formulation design, clinical oncology diagnosis, and retrosynthetic planning, our approach achieves an average accuracy improvement of 12% over baseline methods.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have shown strong knowledge reserves and task-solving capabilities, but still face the challenge of severe hallucination, hindering their practical application. Though scientific theories and rules can efficiently direct the behaviors of human manipulators, LLMs still do not utilize these highly-condensed knowledge sufficiently through training or prompting. To address this issue, we propose \textbf{SciDC}, an LLM generation method that integrate subject-specific knowledge with strong constraints. By adopting strong LLMs to automatically convert flexible knowledge into multi-layered, standardized rules, we build an extensible framework to effectively constrain the model generation on domain tasks. Experiments on scientific tasks including industrial formulation design, clinical tumor diagnosis and retrosynthesis planning, consistently demonstrate the effectiveness of our method, achieving a 12\% accuracy improvement on average compared with vanilla generation. We further discuss the potential of LLMs in automatically inductively summarizing highly-condensed knowledge, looking ahead to practical solutions for accelerating the overall scientific research process. All the code of this paper can be obtained (https://github.com/Maotian-Ma/SciDC).

Problem

Research questions and friction points this paper is trying to address.

hallucination

scientific knowledge

large language models

decoding constraints

domain-specific reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scientific Knowledge Constraints

Hallucination Mitigation

LLM Decoding