cardinalR: Generating Interesting High-Dimensional Data Structures

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
The absence of controllable, reproducible simulation benchmarks for complex high-dimensional structures—including linear/nonlinear dependencies, clustering, and anomalies—hinders rigorous evaluation of machine learning methods. Method: We introduce *cardinalR*, an open-source R package that unifies the generative modeling of diverse high-dimensional structures, including nonlinear manifolds and local anomalies. Its core methodology integrates piecewise polynomial and radial basis function representations to construct flexible nonlinear manifolds, while leveraging Gaussian/t-distribution mixtures to generate clusters and anomalies. All structural properties—including dimensionality, signal-to-noise ratio, and structural strength—are fully parameterized and tunable. Contribution/Results: *cardinalR* provides a standardized, extensible benchmark framework for evaluating dimensionality reduction (e.g., t-SNE, UMAP) and supervised/unsupervised learning algorithms. It significantly enhances model interpretability validation and is accompanied by curated benchmark datasets and comprehensive usage examples.

Technology Category

Application Category

📝 Abstract
Simulated high-dimensional data is useful for testing, validating, and improving algorithms used in dimension reduction, supervised and unsupervised learning. High-dimensional data is characterized by multiple variables that are dependent or associated in some way, such as linear, nonlinear, clustering or anomalies. Here we provide new methods for generating a variety of high-dimensional structures using mathematical functions and statistical distributions organized into the R package cardinalR. Several example data sets are also provided. These will be useful for researchers to better understand how different analytical methods work and can be improved, with a special focus on nonlinear dimension reduction methods. This package enriches the existing toolset of benchmark datasets for evaluating algorithms.
Problem

Research questions and friction points this paper is trying to address.

Generates diverse high-dimensional data structures
Tests nonlinear dimension reduction methods
Provides benchmark datasets for algorithm evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

R package cardinalR generates high-dimensional data
Uses mathematical functions and statistical distributions
Focuses on nonlinear dimension reduction evaluation
🔎 Similar Papers
2021-06-14IEEE Transactions on Visualization and Computer GraphicsCitations: 12