SciCore-Mol: Augmenting Large Language Models with Pluggable Molecular Cognition Modules

πŸ“… 2026-05-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

209K/year
πŸ€– AI Summary
This work addresses the challenge that large language models (LLMs) struggle to reason effectively over heterogeneous scientific data such as molecules, primarily due to the semantic gap between discrete textual symbols and continuous or topological chemical representations. To bridge this gap, the authors propose a modular, plug-and-play cognitive architecture comprising three core components: topology-aware encoding, latent diffusion-based generation, and reaction-aware reasoning. These modules are deeply integrated with an LLM through learnable interfaces, systematically endowing it with molecular-level expertise. Moving beyond conventional text-centric paradigms, the framework achieves substantial performance gains across molecular understanding, generation, reaction prediction, and knowledge synthesis tasks. The resulting 8B-parameter open-source system matches or even surpasses leading closed-source LLMs on multiple benchmarks.
πŸ“ Abstract
Large Language Models (LLMs) are central to the one-for-all intelligent paradigm, but they face a fundamental challenge when dealing with heterogeneous scientific data such as molecules: the inherent gap between discrete linguistic symbols and topological molecular or continuous reaction data leads to significant information loss and semantic noise in text-based reasoning. We propose SciCore-Mol, a modular framework that bridges this gap through three deeply integrated pluggable cognitive modules: a topology-aware perception module, a latent diffusion-based molecular generation module, and a reaction-aware reasoning module. Each module is coupled to the LLM backbone through learned representation interfaces, enabling richer information exchange than is possible with text-only tool feedback. Our experiments on diverse chemical tasks demonstrate that SciCore-Mol achieves strong comprehensive performance across molecular understanding, generation, reaction prediction, and general chemistry knowledge, with an 8B-parameter open-source system that is competitive with and in several dimensions surpasses proprietary large models. This work provides a systematic blueprint for equipping LLMs with scientific expertise through decoupled, pluggable, and flexibly orchestrated modules, with direct implications for drug design, chemical synthesis, and broader scientific discovery.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Molecular Data
Heterogeneous Scientific Data
Semantic Noise
Information Loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

pluggable cognitive modules
molecular representation
latent diffusion
reaction-aware reasoning
LLM augmentation
πŸ’Ό Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge