🤖 AI Summary
Existing Monte Carlo simulation frameworks predominantly focus on design-level considerations and lack systematic implementations compatible with modern statistical programming languages, portable data formats (e.g., Arrow/Parquet), and distributed computing across multi-node clusters.
Method: This paper introduces the “tidy simulation” framework, centered on a structured simulation grid that decouples data generation, analysis, and result aggregation into pure, composable functions. It adheres to functional programming principles and the tidy data paradigm, ensuring language independence and native support for distributed execution.
Contribution/Results: The framework enables modular construction, version-controlled reproducibility, and seamless scalability to thousand-node clusters. Empirical evaluation demonstrates substantial improvements in exploratory research efficiency and cross-platform collaboration, while maintaining high robustness and strict reproducibility—addressing critical gaps in contemporary simulation infrastructure.
📝 Abstract
Monte Carlo simulation studies are at the core of the modern applied, computational, and theoretical statistical literature. Simulation is a broadly applicable research tool, used to collect data on the relative performance of methods or data analysis approaches under a well-defined data-generating process. However, extant literature focuses largely on design aspects of simulation, rather than implementation strategies aligned with the current state of (statistical) programming languages, portable data formats, and multi-node cluster computing.
In this work, I propose tidy simulation: a simple, language-agnostic, yet flexible functional framework for designing, writing, and running simulation studies. It has four components: a tidy simulation grid, a data generation function, an analysis function, and a results table. Using this structure, even the smallest simulations can be written in a consistent, modular way, yet they can be readily scaled to thousands of nodes in a computer cluster should the need arise. Tidy simulation also supports the iterative, sometimes exploratory nature of simulation-based experiments. By adopting the tidy simulation approach, researchers can implement their simulations in a robust, reproducible, and scalable way, which contributes to high-quality statistical science.