Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This work addresses the challenge that current large language models (LLMs) struggle to effectively comprehend molecular graph structures, and existing graph–LLM alignment approaches rely on static tokens, neglect stereochemistry and substructural context, and require costly LLM fine-tuning. To overcome these limitations, the authors propose EDT-Former, an entropy-guided dynamic token Transformer that generates informative molecular fragment-based tokens on-the-fly, enabling efficient alignment between a graph encoder and a frozen LLM backbone. EDT-Former introduces, for the first time, an entropy-guided mechanism that jointly captures both local and global structural characteristics of molecules, significantly enhancing the efficiency and generalization of multimodal molecular understanding. The method achieves state-of-the-art performance across multiple benchmarks, including MoleculeQA, Mol-Instructions, TDC, and MoleculeNet.

Technology Category

Application Category

📝 Abstract

Molecular understanding is central to advancing areas such as scientific discovery, yet Large Language Models (LLMs) struggle to understand molecular graphs effectively. Existing graph-LLM bridges often adapt the Q-Former-style connector with fixed-length static tokens, which is originally designed for vision tasks. These designs overlook stereochemistry and substructural context and typically require costly LLM-backbone fine-tuning, limiting efficiency and generalization. We introduce EDT-Former, an Entropy-guided Dynamic Token Transformer that generates tokens aligned with informative molecular patches, thereby preserving both local and global structural features for molecular graph understanding. Beyond prior approaches, EDT-Former enables alignment between frozen graph encoders and LLMs without tuning the LLM backbone (excluding the embedding layer), resulting in computationally efficient finetuning, and achieves stateof-the-art results on MoleculeQA, Molecule-oriented Mol-Instructions, and property prediction benchmarks (TDC, MoleculeNet), underscoring its effectiveness for scalable and generalizable multimodal molecular understanding

Problem

Research questions and friction points this paper is trying to address.

molecular understanding

graph-LLM alignment

static tokens

stereochemistry

LLM fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-guided Dynamic Tokens

Graph-LLM Alignment

Molecular Understanding