CROP: Integrating Topological and Spatial Structures via Cross-View Prefixes for Molecular LLMs

📅 2025-08-09

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing molecular large language models (LLMs) rely solely on sequential representations such as SMILES, limiting their capacity to capture molecular topology and 3D spatial geometry. To address this, we propose a dual-view fusion framework that, for the first time, jointly incorporates molecular graphs (topology) and molecular images (3D conformations) into LLMs. Our approach introduces a SMILES-guided cross-view prefix mechanism: structural-aware resampling aligns heterogeneous representations, and aligned features are injected as prefix embeddings into the LLM’s context. This design balances expressive power and inference efficiency. We evaluate on three core tasks—molecular description generation, IUPAC name prediction, and property prediction—and achieve significant improvements over state-of-the-art methods. Results demonstrate that synergistic topological–spatial modeling enhances molecular structure understanding both effectively and generally.

Technology Category

Application Category

📝 Abstract

Recent advances in molecular science have been propelled significantly by large language models (LLMs). However, their effectiveness is limited when relying solely on molecular sequences, which fail to capture the complex structures of molecules. Beyond sequence representation, molecules exhibit two complementary structural views: the first focuses on the topological relationships between atoms, as exemplified by the graph view; and the second emphasizes the spatial configuration of molecules, as represented by the image view. The two types of views provide unique insights into molecular structures. To leverage these views collaboratively, we propose the CROss-view Prefixes (CROP) to enhance LLMs' molecular understanding through efficient multi-view integration. CROP possesses two advantages: (i) efficiency: by jointly resampling multiple structural views into fixed-length prefixes, it avoids excessive consumption of the LLM's limited context length and allows easy expansion to more views; (ii) effectiveness: by utilizing the LLM's self-encoded molecular sequences to guide the resampling process, it boosts the quality of the generated prefixes. Specifically, our framework features a carefully designed SMILES Guided Resampler for view resampling, and a Structural Embedding Gate for converting the resulting embeddings into LLM's prefixes. Extensive experiments demonstrate the superiority of CROP in tasks including molecule captioning, IUPAC name prediction and molecule property prediction.

Problem

Research questions and friction points this paper is trying to address.

Enhancing molecular LLMs by integrating topological and spatial structures

Overcoming sequence-only limitations in molecular representation via multi-view prefixes

Improving efficiency and effectiveness in molecular tasks with cross-view integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates topological and spatial molecular views

Uses cross-view prefixes for multi-view integration

Features SMILES Guided Resampler and Structural Embedding Gate

🔎 Similar Papers

3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization