AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model

πŸ“… 2025-07-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current protein foundation models lack test-time scalability and unified capability across diverse protein design tasks. Method: We propose AMix-1, a Bayesian flow network-based model integrating MSA-driven in-context learning with an evolutionary test-time scaling algorithm, optimized according to pretraining scaling laws. Contribution/Results: AMix-1 is the first model to demonstrate progressive emergence of structural understanding capabilities *during inference*, enabling efficient generation of functional proteins directly from sequence inputs. Experimentally, it designed an AmeR variant with 50Γ— enhanced activity. Moreover, its performance consistently improves with increased computational budget at test time, confirming strong test-time scalability. These advances establish a closed-loop paradigm for *in silico* directed evolution followed by wet-lab validation, significantly enhancing protein engineering capacity.

Technology Category

Application Category

πŸ“ Abstract
We introduce AMix-1, a powerful protein foundation model built on Bayesian Flow Networks and empowered by a systematic training methodology, encompassing pretraining scaling laws, emergent capability analysis, in-context learning mechanism, and test-time scaling algorithm. To guarantee robust scalability, we establish a predictive scaling law and reveal the progressive emergence of structural understanding via loss perspective, culminating in a strong 1.7-billion model. Building on this foundation, we devise a multiple sequence alignment (MSA)-based in-context learning strategy to unify protein design into a general framework, where AMix-1 recognizes deep evolutionary signals among MSAs and consistently generates structurally and functionally coherent proteins. This framework enables the successful design of a dramatically improved AmeR variant with an up to $50 imes$ activity increase over its wild type. Pushing the boundaries of protein engineering, we further empower AMix-1 with an evolutionary test-time scaling algorithm for in silico directed evolution that delivers substantial, scalable performance gains as verification budgets are intensified, laying the groundwork for next-generation lab-in-the-loop protein design.
Problem

Research questions and friction points this paper is trying to address.

Develops scalable protein foundation model AMix-1
Unifies protein design via MSA-based learning framework
Enables test-time scaling for in silico directed evolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian Flow Networks for protein modeling
MSA-based in-context learning strategy
Evolutionary test-time scaling algorithm
πŸ”Ž Similar Papers
No similar papers found.
C
Changze Lv
Shanghai Artificial Intelligence Laboratory
J
Jiang Zhou
Generative Symbolic Intelligence Lab (GenSI), Tsinghua University
Siyu Long
Siyu Long
Institute for Al Industry Research (AIR), Tsinghua University
AI4ScienceNatural Language Processing
Lihao Wang
Lihao Wang
Zoox
Artificial IntelligenceComputer VisionMachine LearningAutonomous Driving
J
Jiangtao Feng
Shanghai Artificial Intelligence Laboratory
D
Dongyu Xue
Shanghai Artificial Intelligence Laboratory
Y
Yu Pei
Shanghai Artificial Intelligence Laboratory
H
Hao Wang
Shanghai Artificial Intelligence Laboratory
Z
Zherui Zhang
Shanghai Artificial Intelligence Laboratory
Y
Yuchen Cai
Shanghai Artificial Intelligence Laboratory
Z
Zhiqiang Gao
Shanghai Artificial Intelligence Laboratory
Z
Ziyuan Ma
Shanghai Artificial Intelligence Laboratory
J
Jiakai Hu
Shanghai Artificial Intelligence Laboratory
Chaochen Gao
Chaochen Gao
Institute of Information Engineering,Chinese Academy of Sciences
NLP Long-Context LLM
Jingjing Gong
Jingjing Gong
SII
Machine LearningAI for ScienceLarge Language ModelEmbodied AI
Yuxuan Song
Yuxuan Song
Tsinghua University
Deep Generative ModelsLLM4Science,
Shuyi Zhang
Shuyi Zhang
East China Normal University
Big data analysisSemi-supervised learningHigh-dimensional statisticsApplied data science
Xiaoqing Zheng
Xiaoqing Zheng
Fudan University
Natural Language Processing and Machine Learning
Deyi Xiong
Deyi Xiong
Professor, College of Intelligence and Computing, Tianjin University, China
Natural Language ProcessingLarge Language ModelsAI4Science
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery
Y
Ya-Qin Zhang
Shanghai Artificial Intelligence Laboratory
Wei-Ying Ma
Wei-Ying Ma
Tsinghua University
Generative AI and Large Language Models (LLMs) for Science
B
Bowen Zhou
Shanghai Artificial Intelligence Laboratory
H
Hao Zhou
Institute for AI Industry Research (AIR), Tsinghua University