🤖 AI Summary
Existing polymer modeling approaches predominantly rely on monomer composition, failing to capture emergent global structural properties arising during polymerization. To address this, we propose PolyFormer—the first multimodal pretraining framework designed for infinite monomer sequences—innovatively integrating topological and 3D spatial information. Our method introduces a local graph attention mechanism with a star-connection strategy, incorporates backbone-aware embeddings to enhance representation capacity, and enforces translational repetition invariance to improve robustness. Dual-channel modeling is achieved via generalized message passing, 3D geometric descriptor extraction, and cross-modal fusion. Evaluated on eight polymer property prediction tasks, PolyFormer consistently outperforms state-of-the-art methods, demonstrating superior generalizability and physical consistency with underlying polymer physics.
📝 Abstract
Polymers, composed of repeating structural units called monomers, are fundamental materials in daily life and industry. Accurate property prediction for polymers is essential for their design, development, and application. However, existing modeling approaches, which typically represent polymers by the constituent monomers, struggle to capture the whole properties of polymer, since the properties change during the polymerization process. In this study, we propose a Multimodal Infinite Polymer Sequence (MIPS) pre-training framework, which represents polymers as infinite sequences of monomers and integrates both topological and spatial information for comprehensive modeling. From the topological perspective, we generalize message passing mechanism (MPM) and graph attention mechanism (GAM) to infinite polymer sequences. For MPM, we demonstrate that applying MPM to infinite polymer sequences is equivalent to applying MPM on the induced star-linking graph of monomers. For GAM, we propose to further replace global graph attention with localized graph attention (LGA). Moreover, we show the robustness of the "star linking" strategy through Repeat and Shift Invariance Test (RSIT). Despite its robustness, "star linking" strategy exhibits limitations when monomer side chains contain ring structures, a common characteristic of polymers, as it fails the Weisfeiler-Lehman~(WL) test. To overcome this issue, we propose backbone embedding to enhance the capability of MPM and LGA on infinite polymer sequences. From the spatial perspective, we extract 3D descriptors of repeating monomers to capture spatial information. Finally, we design a cross-modal fusion mechanism to unify the topological and spatial information. Experimental validation across eight diverse polymer property prediction tasks reveals that MIPS achieves state-of-the-art performance.