🤖 AI Summary
Existing molecular generation models are constrained by pretraining data and struggle to effectively explore unknown chemical space. This work proposes a self-guided reinforcement learning agent that operates without pretraining and autonomously constructs stable, novel three-dimensional isomers under given stoichiometric constraints. By integrating online reinforcement learning, multi-composition joint training, physics-based energy evaluation, and geometric validity constraints, the method achieves, for the first time, generalizable molecular generation across diverse chemical compositions. It avoids overfitting to individual formulas and demonstrates nearly an order-of-magnitude increase in the number of valid isomers discovered for unseen compositions compared to existing single-composition reinforcement learning baselines, substantially enhancing the efficiency of generalization and exploration in uncharted chemical space.
📝 Abstract
Discovering novel stable molecules without training data remains a grand scientific challenge. Current molecular generative models are trained on large, pre-curated datasets, which introduce biases and limit exploration of novel chemistry. In contrast, we propose a new paradigm: autonomous, generalized agents capable of mapping vast, unknown chemical spaces without any pretraining. For the first time, we present AtomComposer, a self-guided agent that autonomously constructs valid 3D isomers under stoichiometric constraints and is trained exclusively online using reinforcement learning. Unlike existing approaches that generally overfit to a specific chemical formula, we establish a multi-composition training scheme that enables a broad generalization across diverse chemistry, guided by energy- and validity-based rewards. Our agent can discover up to an order of magnitude more valid isomers on unseen test formulas than existing single-composition reinforcement-learning baselines trained with per-step energy rewards. These results fulfill the promise of online reinforcement learning as a powerful paradigm for scalable, from-scratch exploration of chemical configuration space.