The limits of bio-molecular modeling with large language models : a cross-scale evaluation

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of systematic evaluation benchmarks for cross-scale modeling in biomolecular systems. The authors propose BioMol-LLM-Bench, the first comprehensive benchmark encompassing 26 tasks across four difficulty levels, integrated with external computational tools, to rigorously evaluate 13 prominent large language models. Their analysis reveals that chain-of-thought reasoning yields limited benefits, while hybrid Mamba-Attention architectures demonstrate superior performance in long-sequence modeling. Although supervised fine-tuning enhances domain-specific capabilities, it concurrently compromises generalization. The findings indicate that current models perform adequately on classification tasks but remain notably deficient in handling complex regression problems.
📝 Abstract
The modeling of bio-molecular system across molecular scales remains a central challenge in scientific research. Large language models (LLMs) are increasingly applied to bio-molecular discovery, yet systematic evaluation across multi-scale biological problems and rigorous assessment of their tool-augmented capabilities remain limited. We reveal a systematic gap between LLM performance and mechanistic understanding through the proposed cross-scale bio-molecular benchmark: BioMol-LLM-Bench, a unified framework comprising 26 downstream tasks that covers 4 distinct difficulty levels, and computational tools are integrated for a more comprehensive evaluation. Evaluation on 13 representative models reveals 4 main findings: chain-of-thought data provides limited benefit and may even reduce performance on biological tasks; hybrid mamba-attention architectures are more effective for long bio-molecular sequences; supervised fine-tuning improves specialization at the cost of generalization; and current LLMs perform well on classification tasks but remain weak on challenging regression tasks. Together, these findings provide practical guidance for future LLM-based modeling of molecular systems.
Problem

Research questions and friction points this paper is trying to address.

bio-molecular modeling
large language models
cross-scale evaluation
molecular systems
LLM benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-scale evaluation
BioMol-LLM-Bench
tool-augmented LLMs
hybrid mamba-attention
molecular regression tasks
🔎 Similar Papers
No similar papers found.
Y
Yaxin Xu
Southern University of Science and Technology, Shenzhen, 518055, China.
Y
Yue Zhou
Pengcheng Laboratory, Shenzhen, 518055, China.
T
Tianyu Zhao
Institute of Mechanics, Chinese Academy of Sciences, Beijing, 100190, China.
Fengwei An
Fengwei An
Southern University of Science and Technology
Z
Zhixiang Ren
Pengcheng Laboratory, Shenzhen, 518055, China.