Purely Agentic Black-Box Optimization for Biological Design

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

228K/year
🤖 AI Summary
This work addresses the challenge of effectively integrating scientific literature knowledge into black-box optimization for biological design—such as small-molecule drug discovery, antimicrobial peptide development, and protein engineering—by proposing the first end-to-end language-agent-driven framework. The approach models the optimization process as hierarchical, language-based agent reasoning, leveraging a large scientific language model pretrained on chemistry and biology literature. It integrates retrieval-augmented generation, semantic task descriptions, and complex constraint handling to transcend traditional structure-centric paradigms. Evaluated on GuacaMol molecular generation and antimicrobial peptide optimization benchmarks, the method achieves state-of-the-art performance, substantially improving sample efficiency and target metrics. In vitro experiments further validate that the optimized peptides exhibit potent activity against drug-resistant pathogens.

Technology Category

Application Category

📝 Abstract
Many key challenges in biological design-such as small-molecule drug discovery, antimicrobial peptide development, and protein engineering-can be framed as black-box optimization over vast, complex structured spaces. Existing methods rely mainly on raw structural data and struggle to exploit the rich scientific literature. While large language models (LLMs) have been added to these pipelines, they have been confined to narrow roles within structure-centered optimizers. We instead cast biological black-box optimization as a fully agentic, language-based reasoning process. We introduce Purely Agentic BLack-box Optimization (PABLO), a hierarchical agentic system that uses scientific LLMs pretrained on chemistry and biology literature to generate and iteratively refine biological candidates. On both the standard GuacaMol molecular design and antimicrobial peptide optimization tasks, PABLO achieves state-of-the-art performance, substantially improving sample efficiency and final objective values over established baselines. Compared to prior optimization methods that incorporate LLMs, PABLO achieves competitive token usage per run despite relying on LLMs throughout the optimization loop. Beyond raw performance, the agentic formulation offers key advantages for realistic design: it naturally incorporates semantic task descriptions, retrieval-augmented domain knowledge, and complex constraints. In follow-up in vitro validation, PABLO-optimized peptides showed strong activity against drug-resistant pathogens, underscoring the practical potential of PABLO for therapeutic discovery.
Problem

Research questions and friction points this paper is trying to address.

black-box optimization
biological design
large language models
antimicrobial peptide
molecular design
Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic optimization
black-box optimization
large language models
biological design
retrieval-augmented reasoning
💼 Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge
Natalie Maus
Natalie Maus
PhD. Student, University of Pennsylvania Department of Computer and Information Science
machine learningbayesian optimizationdeep learninggenerative modelingcomputational drug design
Yimeng Zeng
Yimeng Zeng
PhD Student, University of Pennsylvania
Machine LearningBayesian OptimizationGenerative ModelsLarge Language Models
H
Haydn Thomas Jones
University of Pennsylvania, Philadelphia, PA, USA
Y
Yining Huang
University of Pennsylvania, Philadelphia, PA, USA
G
Gaurav Ng Goel
University of Pennsylvania, Philadelphia, PA, USA
A
Alden Rose
University of Pennsylvania, Philadelphia, PA, USA
Kyurae Kim
Kyurae Kim
PhD Student, University of Pennsylvania
Bayesian inferencestochastic optimizationmachine learningsignal processing
H
Hyun-Su Lee
University of Pennsylvania, Philadelphia, PA, USA
Marcelo Der Torossian Torres
Marcelo Der Torossian Torres
University of Pennsylvania
Peptide ChemistryAntimicrobial PeptidesPeptide Design
F
Fangping Wan
University of Pennsylvania, Philadelphia, PA, USA
C
Cesar de la Fuente-Nunez
University of Pennsylvania, Philadelphia, PA, USA
Mark Yatskar
Mark Yatskar
University of Pennsylvania
Language and VisionNatural Language ProcessingComputer VisionFairness in AIMachine Learning
Osbert Bastani
Osbert Bastani
University of Pennsylvania
Machine LearningArtificial IntelligenceProgramming LanguagesSecurity
J
Jacob R. Gardner
University of Pennsylvania, Philadelphia, PA, USA