🤖 AI Summary
Existing de novo protein design methods struggle to simultaneously achieve functional efficacy and foldability. This work proposes CodeFP, a novel model that, for the first time, enables joint generation of protein sequences and structures. By incorporating functional local structural motifs into semantic representations and introducing an auxiliary functional supervision mechanism, CodeFP effectively mitigates the ambiguity inherent in the one-to-many mapping from structure to sequence. The method demonstrates significant improvements over the current state-of-the-art baselines, achieving an average gain of 6.1% in functional consistency and 3.2% in foldability. These results establish CodeFP as a new paradigm for designing highly functional and reliably foldable proteins.
📝 Abstract
De novo functional protein design aims to generate protein sequences that realize specified biochemical functions without relying on evolutionary templates, enabling broad applications in biotechnology and medicine. Existing approaches adopt either direct function-to-sequence mapping or decoupled structure-sequence generation strategies but often fail to achieve functionality and foldability simultaneously. To address this, we propose CodeFP, a Co-generative protein language model for de novo Functional Protein design that simultaneously decodes sequence and structure tokens, thereby enabling superior simultaneous realization of functionality and foldability. CodeFP utilizes functional local structures to enrich functional semantic encodings, overcoming the suboptimal translation of flat encodings into structure tokens, while introducing auxiliary functional supervision to alleviate training ambiguity stemming from the one-to-many structure-to-token mapping. Extensive experiments show that CodeFP consistently achieves average improvements of 6.1% in functional consistency and 3.2% in foldability over the strongest baseline.