CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing molecular large language models (LLMs) rely solely on sequential representations such as SMILES, limiting their capacity to capture molecular topology and 3D spatial geometry. To address this, we propose a dual-view fusion framework that, for the first time, jointly incorporates molecular graphs (topology) and molecular images (3D conformations) into LLMs. Our approach introduces a SMILES-guided cross-view prefix mechanism: structural-aware resampling aligns heterogeneous representations, and aligned features are injected as prefix embeddings into the LLM’s context. This design balances expressive power and inference efficiency. We evaluate on three core tasks—molecular description generation, IUPAC name prediction, and property prediction—and achieve significant improvements over state-of-the-art methods. Results demonstrate that synergistic topological–spatial modeling enhances molecular structure understanding both effectively and generally.

Technology Category

Application Category

📝 Abstract
Recent advances in molecular science have been propelled significantly by large language models (LLMs). However, their effectiveness is limited when relying solely on molecular sequences, which fail to capture the complex structures of molecules. Beyond sequence representation, molecules exhibit two complementary structural views: the first focuses on the topological relationships between atoms, as exemplified by the graph view; and the second emphasizes the spatial configuration of molecules, as represented by the image view. The two types of views provide unique insights into molecular structures. To leverage these views collaboratively, we propose the CROss-view Prefixes (CROP) to enhance LLMs' molecular understanding through efficient multi-view integration. CROP possesses two advantages: (i) efficiency: by jointly resampling multiple structural views into fixed-length prefixes, it avoids excessive consumption of the LLM's limited context length and allows easy expansion to more views; (ii) effectiveness: by utilizing the LLM's self-encoded molecular sequences to guide the resampling process, it boosts the quality of the generated prefixes. Specifically, our framework features a carefully designed SMILES Guided Resampler for view resampling, and a Structural Embedding Gate for converting the resulting embeddings into LLM's prefixes. Extensive experiments demonstrate the superiority of CROP in tasks including molecule captioning, IUPAC name prediction and molecule property prediction.
Problem

Research questions and friction points this paper is trying to address.

Enhancing molecular LLMs by integrating topological and spatial structures
Overcoming sequence-only limitations in molecular representation via multi-view prefixes
Improving efficiency and effectiveness in molecular tasks with cross-view integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates topological and spatial molecular views
Uses cross-view prefixes for multi-view integration
Features SMILES Guided Resampler and Structural Embedding Gate
🔎 Similar Papers
No similar papers found.
Jianting Tang
Jianting Tang
University of Science and Technology of China
LLMsMLLMsRAG
Y
Yubo Wang
University of Science and Technology of China, State Key Laboratory of Cognitive Intelligence, Hefei, Anhui, China
H
Haoyu Cao
University of Science and Technology of China, State Key Laboratory of Cognitive Intelligence, Hefei, Anhui, China
L
Linli Xu
University of Science and Technology of China, State Key Laboratory of Cognitive Intelligence, Hefei, Anhui, China