MolSight: Molecular Property Prediction with Images

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This work addresses the high computational cost and complex data engineering requirements of conventional molecular property prediction methods, which typically rely on molecular graphs, 3D conformations, or large language models. For the first time, it systematically investigates a purely vision-based paradigm by evaluating ten visual architectures and seven pretraining strategies across ten downstream tasks using a dataset of two million molecular scaffold images. The study introduces a chemistry-informed curriculum learning strategy that dynamically orders training samples according to molecular structural complexity. Experimental results demonstrate that accurate predictions can be achieved using only a single molecular image, with the proposed approach ranking first on five out of ten benchmarks and placing within the top two on all tasks, while reducing computational costs by up to 80× compared to state-of-the-art multimodal methods.
📝 Abstract
Every molecule ever synthesised can be drawn as a 2D skeletal diagram, yet in modern property prediction this universally available representation has received less focus in favour of molecular graphs, 3D conformers, or billion-parameter language models, each imposing its own computational and data-engineering overhead. We present $\textbf{MolSight}$, the first systematic large-scale study of vision-based Molecular Property Prediction (MPP). Using 10 vision architectures, 7 pre-training strategies, and $2\,M$ molecule images, we evaluate performance across 10 downstream tasks spanning physical-property regression, drug-discovery classification, and quantum-chemistry prediction. To account for the wide variation in structural complexity across pre-training molecules, we further propose a $\textbf{chemistry-informed curriculum}$: five structural complexity descriptors partition the corpus into five tiers of increasing chemical difficulty, consistently outperforming non-curriculum baselines. We show that a single rendered bond-line image, processed by a vision encoder, is sufficient for competitive molecular property prediction, i.e. $\textit{chemical insight from sight alone}$. The best curriculum-trained configuration achieves the top result on $\textbf{5 of 10}$ benchmarks and top two on $\textbf{all 10}$, at $\textbf{$\textit{80$\times$ lower}$}$ FLOPs than the nearest multi-modal competitor.
Problem

Research questions and friction points this paper is trying to address.

Molecular Property Prediction
2D Molecular Images
Vision-based Prediction
Structural Complexity
Chemistry-informed Curriculum
Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-based molecular property prediction
chemistry-informed curriculum learning
molecular image representation
bond-line diagram
efficient MPP
🔎 Similar Papers
No similar papers found.