π€ AI Summary
Current protein foundation models lack test-time scalability and unified capability across diverse protein design tasks. Method: We propose AMix-1, a Bayesian flow network-based model integrating MSA-driven in-context learning with an evolutionary test-time scaling algorithm, optimized according to pretraining scaling laws. Contribution/Results: AMix-1 is the first model to demonstrate progressive emergence of structural understanding capabilities *during inference*, enabling efficient generation of functional proteins directly from sequence inputs. Experimentally, it designed an AmeR variant with 50Γ enhanced activity. Moreover, its performance consistently improves with increased computational budget at test time, confirming strong test-time scalability. These advances establish a closed-loop paradigm for *in silico* directed evolution followed by wet-lab validation, significantly enhancing protein engineering capacity.
π Abstract
We introduce AMix-1, a powerful protein foundation model built on Bayesian Flow Networks and empowered by a systematic training methodology, encompassing pretraining scaling laws, emergent capability analysis, in-context learning mechanism, and test-time scaling algorithm. To guarantee robust scalability, we establish a predictive scaling law and reveal the progressive emergence of structural understanding via loss perspective, culminating in a strong 1.7-billion model. Building on this foundation, we devise a multiple sequence alignment (MSA)-based in-context learning strategy to unify protein design into a general framework, where AMix-1 recognizes deep evolutionary signals among MSAs and consistently generates structurally and functionally coherent proteins. This framework enables the successful design of a dramatically improved AmeR variant with an up to $50 imes$ activity increase over its wild type. Pushing the boundaries of protein engineering, we further empower AMix-1 with an evolutionary test-time scaling algorithm for in silico directed evolution that delivers substantial, scalable performance gains as verification budgets are intensified, laying the groundwork for next-generation lab-in-the-loop protein design.