🤖 AI Summary
Molecular generation and property prediction have long been treated as disjoint tasks; joint modeling faces challenges from architectural coupling and optimization conflicts between the two objectives. Method: We propose Hyformer—the first Transformer-based unified generative-discriminative model—featuring an alternating attention masking mechanism and a multi-task unified pretraining framework. It jointly optimizes autoregressive generation and property discrimination within a single architecture, enabling bidirectional capability enhancement. Contribution/Results: By modeling the joint distribution to unify representation spaces, Hyformer achieves state-of-the-art performance across diverse downstream tasks—including molecular representation learning, hit compound identification, and antimicrobial peptide design—outperforming dedicated single-task models. It significantly improves both generation quality (e.g., validity, uniqueness, novelty) and prediction accuracy (e.g., regression and classification metrics), demonstrating the efficacy of co-optimizing generative and discriminative objectives in a shared latent space.
📝 Abstract
Modeling the joint distribution of the data samples and their properties allows to construct a single model for both data generation and property prediction, with synergistic capabilities reaching beyond purely generative or predictive models. However, training joint models presents daunting architectural and optimization challenges. Here, we propose Hyformer, a transformer-based joint model that successfully blends the generative and predictive functionalities, using an alternating attention mask together with a unified pre-training scheme. We show that Hyformer rivals other joint models, as well as state-of-the-art molecule generation and property prediction models. Additionally, we show the benefits of joint modeling in downstream tasks of molecular representation learning, hit identification and antimicrobial peptide design.