QuantumCanvas: A Multimodal Benchmark for Visual Learning of Atomic Interactions

📅 2025-12-01

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing machine learning models for molecules and materials often lack physical transferability and fail to capture essential quantum mechanical interactions between atomic pairs—such as those governing chemical bonding and charge redistribution. Method: We propose a multimodal benchmark grounded in two-body quantum systems as the fundamental modeling unit. Crucially, we embed orbital physics priors into visual representations via ten-channel atomic-pair images encoding orbital densities, co-occupancy maps, and charge projections—implicitly capturing spatial, angular, and electrostatic symmetries without explicit coordinate inputs. We integrate GATv2, EGNN, and DimeNet architectures to jointly learn from these images and 18-dimensional physical labels. Results: Our approach achieves state-of-the-art performance across 18 tasks (e.g., GATv2 yields a bandgap MAE of 0.201 eV). Pretraining significantly accelerates convergence and improves generalization on downstream benchmarks including QM9 and MD17, while ensuring high accuracy, strong interpretability, and strict physical consistency.

Technology Category

Application Category

📝 Abstract

Despite rapid advances in molecular and materials machine learning, most models still lack physical transferability: they fit correlations across whole molecules or crystals rather than learning the quantum interactions between atomic pairs. Yet bonding, charge redistribution, orbital hybridization, and electronic coupling all emerge from these two-body interactions that define local quantum fields in many-body systems. We introduce QuantumCanvas, a large-scale multimodal benchmark that treats two-body quantum systems as foundational units of matter. The dataset spans 2,850 element-element pairs, each annotated with 18 electronic, thermodynamic, and geometric properties and paired with ten-channel image representations derived from l- and m-resolved orbital densities, angular field transforms, co-occupancy maps, and charge-density projections. These physically grounded images encode spatial, angular, and electrostatic symmetries without explicit coordinates, providing an interpretable visual modality for quantum learning. Benchmarking eight architectures across 18 targets, we report mean absolute errors of 0.201 eV on energy gap using GATv2, 0.265 eV on HOMO and 0.274 eV on LUMO using EGNN. For energy-related quantities, DimeNet attains 2.27 eV total-energy MAE and 0.132 eV repulsive-energy MAE, while a multimodal fusion model achieves a 2.15 eV Mermin free-energy MAE. Pretraining on QuantumCanvas further improves convergence stability and generalization when fine-tuned on larger datasets such as QM9, MD17, and CrysMTM. By unifying orbital physics with vision-based representation learning, QuantumCanvas provides a principled and interpretable basis for learning transferable quantum interactions through coupled visual and numerical modalities. Dataset and model implementations are available at https://github.com/KurbanIntelligenceLab/QuantumCanvas.

Problem

Research questions and friction points this paper is trying to address.

Develops a benchmark for learning quantum interactions between atomic pairs.

Encodes atomic interactions via multimodal visual representations without coordinates.

Improves transferability and generalization in molecular machine learning models.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal benchmark using orbital density images for quantum learning

Visual encoding of atomic interactions without explicit coordinates

Fusion model improves energy prediction accuracy and transferability

🔎 Similar Papers

ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area