How to make the most of your masked language model for protein engineering

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current antibody engineering lacks systematic approaches for efficiently optimizing specific biological functions of proteins. This work proposes a flexible sampling strategy based on stochastic beam search, which leverages a masked language model to evaluate the pseudo-perplexity of single-point mutation neighborhoods and reframes the sequence generation process as a full-sequence multi-objective optimization problem. For the first time, large-scale in vitro experiments validate that the choice of sampling strategy exerts an influence on optimization performance comparable to that of the underlying model itself, thereby highlighting the critical role of sampling design in protein engineering.

Technology Category

Application Category

📝 Abstract
A plethora of protein language models have been released in recent years. Yet comparatively little work has addressed how to best sample from them to optimize desired biological properties. We fill this gap by proposing a flexible, effective sampling method for masked language models (MLMs), and by systematically evaluating models and methods both in silico and in vitro on actual antibody therapeutics campaigns. Firstly, we propose sampling with stochastic beam search, exploiting the fact that MLMs are remarkably efficient at evaluating the pseudo-perplexity of the entire 1-edit neighborhood of a sequence. Reframing generation in terms of entire-sequence evaluation enables flexible guidance with multiple optimization objectives. Secondly, we report results from our extensive in vitro head-to-head evaluation for the antibody engineering setting. This reveals that choice of sampling method is at least as impactful as the model used, motivating future research into this under-explored area.
Problem

Research questions and friction points this paper is trying to address.

masked language model
protein engineering
sampling method
antibody optimization
biological properties
Innovation

Methods, ideas, or system contributions that make the work stand out.

masked language models
stochastic beam search
protein engineering
sequence optimization
antibody therapeutics
🔎 Similar Papers
No similar papers found.