A collaborative digital twin built on FAIR data and compute infrastructure

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of fragmented data sharing and delayed model updates among geographically distributed teams in self-driving laboratories (SDLs). Methodologically, we propose a collaborative digital twin framework grounded in the FAIR principles: (1) a distributed infrastructure integrating nanoHUB Sim2L for automated data extraction and indexing; (2) ResultsDB as a FAIR-compliant data warehouse; (3) a web interface enabling multi-source data submission; and (4) nanoHUB workflows coupling active learning with online machine learning to enable dynamic model updating and closed-loop experimental optimization. Innovatively, we introduce the “frugal twin” paradigm—demonstrated via low-cost food-grade dye color formulation—to broaden accessibility and scalability. Our contribution is a scalable, FAIR-enabled collaborative optimization framework that significantly improves cross-regional scientific discovery efficiency. Empirical validation on color formulation optimization confirms both system feasibility and generalizability.

Technology Category

Application Category

📝 Abstract
The integration of machine learning with automated experimentation in self-driving laboratories (SDL) offers a powerful approach to accelerate discovery and optimization tasks in science and engineering applications. When supported by findable, accessible, interoperable, and reusable (FAIR) data infrastructure, SDLs with overlapping interests can collaborate more effectively. This work presents a distributed SDL implementation built on nanoHUB services for online simulation and FAIR data management. In this framework, geographically dispersed collaborators conducting independent optimization tasks contribute raw experimental data to a shared central database. These researchers can then benefit from analysis tools and machine learning models that automatically update as additional data become available. New data points are submitted through a simple web interface and automatically processed using a nanoHUB Sim2L, which extracts derived quantities and indexes all inputs and outputs in a FAIR data repository called ResultsDB. A separate nanoHUB workflow enables sequential optimization using active learning, where researchers define the optimization objective, and machine learning models are trained on-the-fly with all existing data, guiding the selection of future experiments. Inspired by the concept of ``frugal twin", the optimization task seeks to find the optimal recipe to combine food dyes to achieve the desired target color. With easily accessible and inexpensive materials, researchers and students can set up their own experiments, share data with collaborators, and explore the combination of FAIR data, predictive ML models, and sequential optimization. The tools introduced are generally applicable and can easily be extended to other optimization problems.
Problem

Research questions and friction points this paper is trying to address.

Accelerate discovery via FAIR data and self-driving labs
Enable collaborative optimization using shared FAIR databases
Develop frugal digital twin for color recipe optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative digital twin using FAIR data
Machine learning with automated experimentation
Distributed SDL implementation on nanoHUB
🔎 Similar Papers
No similar papers found.
T
Thomas M. Deucher
School of Materials Engineering, Purdue University, West Lafayette, Indiana, 47907 USA
J
Juan C. Verduzco
School of Materials Engineering, Purdue University, West Lafayette, Indiana, 47907 USA
M
Michael Titus
School of Materials Engineering, Purdue University, West Lafayette, Indiana, 47907 USA
Alejandro Strachan
Alejandro Strachan
Reilly Professor of Materials Engineering, Purdue University
Predictive simulations of materialsMultiscale modelingTheoretical materials science