FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing large language models (LLMs) for chemistry predominantly focus on molecule-level property prediction and lack fine-grained functional group (FG)-aware representation, limiting interpretability and structure-sensitive reasoning. Method: We introduce FGBench—the first large-scale, FG-level molecular property reasoning benchmark—comprising 625K questions. It pioneers fine-grained FG localization and multi-FG interaction annotations, formalizing three distinct FG-level reasoning paradigms. Leveraging SMILES parsing and multimodal data integration, we construct a dual-task evaluation framework covering regression and classification. Contribution/Results: Evaluated on 7K carefully curated samples, state-of-the-art LLMs exhibit significant deficiencies in FG-level reasoning. FGBench thus serves as a critical diagnostic and guiding resource for advancing structural understanding, interpretability, and chemically grounded reasoning in LLMs.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have gained significant attention in chemistry. However, most existing datasets center on molecular-level property prediction and overlook the role of fine-grained functional group (FG) information. Incorporating FG-level data can provide valuable prior knowledge that links molecular structures with textual descriptions, which can be used to build more interpretable, structure-aware LLMs for reasoning on molecule-related tasks. Moreover, LLMs can learn from such fine-grained information to uncover hidden relationships between specific functional groups and molecular properties, thereby advancing molecular design and drug discovery. Here, we introduce FGBench, a dataset comprising 625K molecular property reasoning problems with functional group information. Functional groups are precisely annotated and localized within the molecule, which ensures the dataset's interoperability thereby facilitating further multimodal applications. FGBench includes both regression and classification tasks on 245 different functional groups across three categories for molecular property reasoning: (1) single functional group impacts, (2) multiple functional group interactions, and (3) direct molecular comparisons. In the benchmark of state-of-the-art LLMs on 7K curated data, the results indicate that current LLMs struggle with FG-level property reasoning, highlighting the need to enhance reasoning capabilities in LLMs for chemistry tasks. We anticipate that the methodology employed in FGBench to construct datasets with functional group-level information will serve as a foundational framework for generating new question-answer pairs, enabling LLMs to better understand fine-grained molecular structure-property relationships. The dataset and evaluation code are available at href{https://github.com/xuanliugit/FGBench}{https://github.com/xuanliugit/FGBench}.

Problem

Research questions and friction points this paper is trying to address.

Enhance molecular property reasoning with functional group-level data

Address lack of fine-grained functional group information in LLMs

Improve interpretability and structure-awareness in chemistry-focused LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates functional group-level molecular property reasoning

Precisely annotates and localizes functional groups

Includes regression and classification tasks

🔎 Similar Papers

No similar papers found.