LLMs for Bayesian Optimization in Scientific Domains: Are We There Yet?

📅 2025-09-24

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This work investigates whether large language models (LLMs) can serve as general-purpose scientific agents for feedback-driven Bayesian experimental design—e.g., genetic perturbation and molecular property discovery. Experiments reveal that current instruction-tuned LLMs are insensitive to experimental feedback, fail to dynamically adapt strategies, and underperform classical methods significantly. To address this, the authors propose LLMNN: a fine-tuning-free hybrid framework that integrates LLMs’ prior knowledge with feedback-aware nearest-neighbor sampling for efficient contextual experimental design—bypassing the need for strong in-context adaptation or parameter updates. Evaluated across multiple scientific domains, LLMNN matches or surpasses established baselines—including Gaussian processes and linear bandits—demonstrating, for the first time, the feasibility and competitiveness of coupling LLMs with lightweight feedback mechanisms in closed-loop scientific discovery.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have recently been proposed as general-purpose agents for experimental design, with claims that they can perform in-context experimental design. We evaluate this hypothesis using both open- and closed-source instruction-tuned LLMs applied to genetic perturbation and molecular property discovery tasks. We find that LLM-based agents show no sensitivity to experimental feedback: replacing true outcomes with randomly permuted labels has no impact on performance. Across benchmarks, classical methods such as linear bandits and Gaussian process optimization consistently outperform LLM agents. We further propose a simple hybrid method, LLM-guided Nearest Neighbour (LLMNN) sampling, that combines LLM prior knowledge with nearest-neighbor sampling to guide the design of experiments. LLMNN achieves competitive or superior performance across domains without requiring significant in-context adaptation. These results suggest that current open- and closed-source LLMs do not perform in-context experimental design in practice and highlight the need for hybrid frameworks that decouple prior-based reasoning from batch acquisition with updated posteriors.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' capability for in-context experimental design in science

Assessing LLM sensitivity to experimental feedback in optimization tasks

Developing hybrid methods combining LLM knowledge with classical optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid method combining LLM prior knowledge with nearest-neighbor sampling

Decouples prior-based reasoning from batch acquisition with updated posteriors

Achieves competitive performance without requiring in-context adaptation

🔎 Similar Papers

No similar papers found.