LeMat-GenBench: A Unified Evaluation Framework for Crystal Generative Models

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Current inorganic crystal generation models lack a standardized evaluation framework, hindering fair comparison and reliability assessment. To address this, we propose CrystalBench—the first unified benchmark for crystal generation—featuring a multidimensional metric suite evaluating structural stability, chemical space diversity, and generation novelty. We release an open-source evaluation toolkit and a public leaderboard on Hugging Face. Using CrystalBench, we systematically evaluate 12 state-of-the-art generative models, uncovering—for the first time—a fundamental trade-off among stability, diversity, and novelty. All code, datasets, and benchmark results are fully open-sourced to ensure reproducibility, extensibility, and continuous model iteration. CrystalBench establishes a rigorous, community-driven foundation for evaluating generative models in materials design.

Technology Category

Application Category

📝 Abstract

Generative machine learning (ML) models hold great promise for accelerating materials discovery through the inverse design of inorganic crystals, enabling an unprecedented exploration of chemical space. Yet, the lack of standardized evaluation frameworks makes it challenging to evaluate, compare, and further develop these ML models meaningfully. In this work, we introduce LeMat-GenBench, a unified benchmark for generative models of crystalline materials, supported by a set of evaluation metrics designed to better inform model development and downstream applications. We release both an open-source evaluation suite and a public leaderboard on Hugging Face, and benchmark 12 recent generative models. Results reveal that an increase in stability leads to a decrease in novelty and diversity on average, with no model excelling across all dimensions. Altogether, LeMat-GenBench establishes a reproducible and extensible foundation for fair model comparison and aims to guide the development of more reliable, discovery-oriented generative models for crystalline materials.

Problem

Research questions and friction points this paper is trying to address.

Lack of standardized evaluation frameworks for crystal generative models.

Difficulty in comparing and developing generative models for materials discovery.

Need for reproducible benchmarks to assess model reliability and diversity.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces unified benchmark for crystal generative models

Provides open-source evaluation suite and public leaderboard

Establishes reproducible foundation for fair model comparison

🔎 Similar Papers

MatText: Do Language Models Need More than Text & Scale for Materials Modeling?