From Knowledge to Reasoning: Evaluating LLMs for Ionic Liquids Research in Chemical and Biological Engineering

📅 2025-05-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) lack rigorous evaluation of scientific reasoning capabilities in chemical and biological engineering (CBE), particularly for ionic liquid (IL)-based carbon capture—a critical carbon-neutral technology. Method: We introduce IL-CapBench, the first expert-annotated, 5,920-instance benchmark dataset for IL carbon-capture reasoning. It features a multi-dimensional difficulty framework integrating linguistic understanding and domain-specific knowledge, constructed via expert co-annotation, controllable difficulty design, and domain-consistency validation. We evaluate open-weight models (<10B parameters)—including Phi-3, Qwen2, and Llama3—under zero-shot and few-shot settings. Contribution/Results: Our analysis reveals that while small LLMs possess basic IL knowledge, their domain-specific scientific reasoning remains severely limited. Crucially, we identify for the first time a synergistic optimization pathway between model performance and carbon footprint reduction, providing empirical grounding for deploying LLMs in carbon-neutral research. This work establishes the first CBE-specialized LLM benchmark and advances trustworthy AI for sustainable chemistry.

Technology Category

Application Category

📝 Abstract
Although Large Language Models (LLMs) have achieved remarkable performance in diverse general knowledge and reasoning tasks, their utility in the scientific domain of Chemical and Biological Engineering (CBE) is unclear. Hence, it necessitates challenging evaluation benchmarks that can measure LLM performance in knowledge- and reasoning-based tasks, which is lacking. As a foundational step, we empirically measure the reasoning capabilities of LLMs in CBE. We construct and share an expert-curated dataset of 5,920 examples for benchmarking LLMs' reasoning capabilities in the niche domain of Ionic Liquids (ILs) for carbon sequestration, an emergent solution to reducing global warming. The dataset presents different difficulty levels by varying along the dimensions of linguistic and domain-specific knowledge. Benchmarking three less than 10B parameter open-source LLMs on the dataset suggests that while smaller general-purpose LLMs are knowledgeable about ILs, they lack domain-specific reasoning capabilities. Based on our results, we further discuss considerations for leveraging LLMs for carbon capture research using ILs. Since LLMs have a high carbon footprint, gearing them for IL research can symbiotically benefit both fields and help reach the ambitious carbon neutrality target by 2050. Dataset link: https://github.com/sougata-ub/llms_for_ionic_liquids
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' reasoning in Chemical and Biological Engineering
Assessing LLMs' performance in Ionic Liquids for carbon capture
Addressing lack of domain-specific reasoning in small LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expert-curated dataset for Ionic Liquids research
Benchmarking small LLMs on domain-specific reasoning
Symbiotic application of LLMs for carbon neutrality
🔎 Similar Papers
No similar papers found.
G
Gaurab Sarkar
State University of New York at Buffalo, Department of Chemical and Biological Engineering
Sougata Saha
Sougata Saha
MBZUAI
NLPLLMCultureAI