🤖 AI Summary
Current large-scale self-supervised models exhibit insufficient adaptability for AI-driven biological design. Method: We systematically review foundational models targeting protein engineering, small-molecule design, and genomic sequence design, and propose the first taxonomy for AI-based biological design foundation models. Our analysis innovatively addresses core challenges—including biological sequence modeling architectures (e.g., Transformer and CNN-RNN hybrids), controllable generation (via prompting and latent-space steering), and multimodal alignment—by integrating self-supervised pretraining, structured sequence modeling, and controllable generation techniques. Contribution/Results: We identify critical bottlenecks—low functional plausibility and poor experimental verifiability—and delineate actionable optimization pathways. Our framework significantly enhances the biological validity and wet-lab feasibility of generated sequences, advancing the practical deployment of foundation models in synthetic biology.
📝 Abstract
This paper surveys foundation models for AI-enabled biological design, focusing on recent developments in applying large-scale, self-supervised models to tasks such as protein engineering, small molecule design, and genomic sequence design. Though this domain is evolving rapidly, this survey presents and discusses a taxonomy of current models and methods. The focus is on challenges and solutions in adapting these models for biological applications, including biological sequence modeling architectures, controllability in generation, and multi-modal integration. The survey concludes with a discussion of open problems and future directions, offering concrete next-steps to improve the quality of biological sequence generation.