🤖 AI Summary
This work addresses the limitation of current AI/ML models, which are predominantly trained on ideal bulk crystals and thus struggle to accurately represent real two-dimensional (2D) materials dominated by surfaces, interfaces, and defects. To bridge this gap, the authors introduce Mat3ra-2D, an open-source framework that establishes an AI-ready data generation paradigm tailored for realistic 2D materials and heterointerfaces. The framework employs a modular material data standard, a configuration-driven construction pipeline, and interactive Jupyter Notebook templates to enable traceable and reproducible generation of structures incorporating defects, disorder, and strain-matched facets and interfaces. By doing so, Mat3ra-2D substantially lowers the barrier to data preparation for AI/ML applications in complex 2D material systems and facilitates systematic, reusable design of orientation-specific atomic structures.
📝 Abstract
Artificial intelligence (AI) and machine learning (ML) models in materials science are predominantly trained on ideal bulk crystals, limiting their transferability to real-world applications where surfaces, interfaces, and defects dominate. We present Mat3ra-2D, an open-source framework for the rapid design of realistic two-dimensional materials and related structures, including slabs and heterogeneous interfaces, with support for disorder and defect-driven complexity. The approach combines: (1) well-defined standards for storing and exchanging materials data with a modular implementation of core concepts and (2) transformation workflows expressed as configuration-builder pipelines that preserve provenance and metadata. We implement typical structure generation tasks, such as constructing orientation-specific slabs or strain-matching interfaces, in reusable Jupyter notebooks that serve as both interactive documentation and templates for reproducible runs. To lower the barrier to adoption, we design the examples to run in any web browser and demonstrate how to incorporate these developments into a web application. Mat3ra-2D enables systematic creation and organization of realistic 2D- and interface-aware datasets for AI/ML-ready applications.