CrystalICL: Enabling In-Context Learning for Crystal Generation

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Current large language models (LLMs) for crystal generation support only zero-shot inference, lacking mechanisms for few-shot generation—i.e., leveraging a small number of target-property examples to guide structural design. To address this gap, we propose the first context-learning-enabled few-shot crystal generation framework. Our approach introduces three key innovations: (1) a space-group-aware crystal tokenization scheme that enhances symmetry-preserving structural representation; (2) a conditional-structure-aware hybrid instruction fine-tuning framework; and (3) a multi-task instruction optimization strategy. Evaluated on four standard crystal generation benchmarks, our method achieves significant improvements over state-of-the-art methods in both conditional and unconditional generation settings. Notably, it is the first to enable example-driven, controllable, and efficient crystal structure modeling—demonstrating robust generalization from limited property-structure exemplars while preserving physical validity and symmetry constraints.

Technology Category

Application Category

📝 Abstract

Designing crystal materials with desired physicochemical properties remains a fundamental challenge in materials science. While large language models (LLMs) have demonstrated strong in-context learning (ICL) capabilities, existing LLM-based crystal generation approaches are limited to zero-shot scenarios and are unable to benefit from few-shot scenarios. In contrast, human experts typically design new materials by modifying relevant known structures which aligns closely with the few-shot ICL paradigm. Motivated by this, we propose CrystalICL, a novel model designed for few-shot crystal generation. Specifically, we introduce a space-group based crystal tokenization method, which effectively reduces the complexity of modeling crystal symmetry in LLMs. We further introduce a condition-structure aware hybrid instruction tuning framework and a multi-task instruction tuning strategy, enabling the model to better exploit ICL by capturing structure-property relationships from limited data. Extensive experiments on four crystal generation benchmarks demonstrate the superiority of CrystalICL over the leading baseline methods on conditional and unconditional generation tasks.

Problem

Research questions and friction points this paper is trying to address.

Enabling few-shot in-context learning for crystal generation

Reducing complexity of modeling crystal symmetry in LLMs

Capturing structure-property relationships from limited data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-shot crystal generation model

Space-group based tokenization method

Hybrid instruction tuning framework

🔎 Similar Papers

Crystalline Material Discovery in the Era of Artificial Intelligence