Mixing Vector Model for Copolymer Inference via Mixed Integer Linear Programming

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study addresses the computationally driven inverse design of copolymers targeting specific blend ratios and desired performance properties, without requiring sequence information. The authors propose a Mixture Vector (MV) model that represents copolymer features as convex combinations of monomer descriptors weighted by their blending ratios. For the first time, this representation is embedded within a mixed-integer linear programming (MILP) framework, enabling precise and scalable inverse generation of multi-monomer copolymers. Coupled with machine learning–based property predictors trained across ten physicochemical datasets, the approach achieves test R² values exceeding 0.7 on nine datasets (with six surpassing 0.9). The method successfully accomplishes tractable inverse design for ternary copolymer systems and demonstrates robustness through external validation.

📝 Abstract

A novel two-phase molecule inference framework, mol-infer, has recently been developed to infer chemical graphs with prescribed abstract structures and desired property values through mixed integer linear programming (MILP) under the two-layered model, with guaranteed optimality and exactness relative to the given learned prediction function and structural constraints. In this study, we extend this framework to copolymers by introducing a simple feature representation, called the mixing vector (MV) model. In the proposed model, a copolymer feature vector is represented as a convex combination of MILP-tractable monomer descriptors weighted by the mixing ratio of the constituent monomers. This representation does not require explicit sequence-class information and is therefore naturally compatible with MILP-based inverse design. Under this model, we construct prediction functions for several copolymer property datasets using artificial neural networks, reduced quadratic multiple linear regression, and random forests. The proposed representation achieves practically useful predictive performance across multiple physicochemical property datasets; in particular, the best test R^2 score exceeds 0.7 for nine of the ten datasets and exceeds 0.9 for six datasets. We also formulate a multi-monomer inverse-design problem under the MV representation with a prescribed mixing ratio and show that the resulting MILP instances remain tractable, even for three-monomer settings. Finally, we perform an external consistency check by re-evaluating the inferred candidates and comparing the re-computed property values with those predicted by the learned model. Overall, the proposed framework gives a tractable first step toward model-level exact inverse design of copolymers under the two-layered model.

Problem

Research questions and friction points this paper is trying to address.

copolymer

inverse design

mixing vector

property prediction

molecular inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

mixing vector model

copolymer inverse design

mixed integer linear programming