🤖 AI Summary
The scarcity of real-world data severely hinders the widespread adoption of subsymbolic AI. To address this challenge, this work proposes a unified reference framework based on digital twins to systematically design and analyze simulation-based synthetic data generation methods for AI training. By integrating digital twin technology, high-fidelity simulation, and synthetic data generation, the framework delineates core components, advantages, and key challenges, offering a methodological foundation for producing high-quality, reproducible training data. This study not only fills the critical gap in the lack of systematic guidance for synthetic data generation but also provides a scalable and reusable technical pathway to mitigate reliance on real-world data.
📝 Abstract
As insufficient data volume and quality remain the key impediments to the adoption of modern subsymbolic AI, techniques of synthetic data generation are in high demand. Simulation offers an apt, systematic approach to generating diverse synthetic data. This chapter introduces the reader to the key concepts, benefits, and challenges of simulation-based synthetic data generation for AI training purposes, and to a reference framework to describe, design, and analyze digital twin-based AI simulation solutions.