🤖 AI Summary
Addressing the challenge of simultaneously achieving high molecular quality, diversity, and inference efficiency in text-to-molecule generation, this paper proposes CAT-VMF, a causal-aware molecular generation framework. First, it introduces the Causal-Aware Transformer (CAT) to explicitly model causal dependencies between textual inputs and molecular graph structures. Second, it develops a Variational Mean-Field (VMF) model with a mixture-of-Gaussians prior to enhance latent-space expressiveness and enable one-step, efficient sampling. Evaluated on four standard benchmarks, CAT-VMF achieves state-of-the-art performance: 74.5% novelty, 70.3% diversity, and 100% molecular validity, while requiring only a single function evaluation for conditional generation—substantially outperforming diffusion-based approaches. The core contribution lies in the novel integration of causal modeling with flow-based variational inference, enabling, for the first time, scalable, single-step, text-conditioned molecular generation without compromising generation quality.
📝 Abstract
Molecular generation conditioned on textual descriptions is a fundamental task in computational chemistry and drug discovery. Existing methods often struggle to simultaneously ensure high-quality, diverse generation and fast inference. In this work, we propose a novel causality-aware framework that addresses these challenges through two key innovations. First, we introduce a Causality-Aware Transformer (CAT) that jointly encodes molecular graph tokens and text instructions while enforcing causal dependencies during generation. Second, we develop a Variational Mean Flow (VMF) framework that generalizes existing flow-based methods by modeling the latent space as a mixture of Gaussians, enhancing expressiveness beyond unimodal priors. VMF enables efficient one-step inference while maintaining strong generation quality and diversity. Extensive experiments on four standard molecular benchmarks demonstrate that our model outperforms state-of-the-art baselines, achieving higher novelty (up to 74.5%), diversity (up to 70.3%), and 100% validity across all datasets. Moreover, VMF requires only one number of function evaluation (NFE) during conditional generation and up to five NFEs for unconditional generation, offering substantial computational efficiency over diffusion-based methods.