cardinalR: Generating Interesting High-Dimensional Data Structures

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

The absence of controllable, reproducible simulation benchmarks for complex high-dimensional structures—including linear/nonlinear dependencies, clustering, and anomalies—hinders rigorous evaluation of machine learning methods. Method: We introduce *cardinalR*, an open-source R package that unifies the generative modeling of diverse high-dimensional structures, including nonlinear manifolds and local anomalies. Its core methodology integrates piecewise polynomial and radial basis function representations to construct flexible nonlinear manifolds, while leveraging Gaussian/t-distribution mixtures to generate clusters and anomalies. All structural properties—including dimensionality, signal-to-noise ratio, and structural strength—are fully parameterized and tunable. Contribution/Results: *cardinalR* provides a standardized, extensible benchmark framework for evaluating dimensionality reduction (e.g., t-SNE, UMAP) and supervised/unsupervised learning algorithms. It significantly enhances model interpretability validation and is accompanied by curated benchmark datasets and comprehensive usage examples.

Technology Category

Application Category

📝 Abstract

Simulated high-dimensional data is useful for testing, validating, and improving algorithms used in dimension reduction, supervised and unsupervised learning. High-dimensional data is characterized by multiple variables that are dependent or associated in some way, such as linear, nonlinear, clustering or anomalies. Here we provide new methods for generating a variety of high-dimensional structures using mathematical functions and statistical distributions organized into the R package cardinalR. Several example data sets are also provided. These will be useful for researchers to better understand how different analytical methods work and can be improved, with a special focus on nonlinear dimension reduction methods. This package enriches the existing toolset of benchmark datasets for evaluating algorithms.

Problem

Research questions and friction points this paper is trying to address.

Generates diverse high-dimensional data structures

Tests nonlinear dimension reduction methods

Provides benchmark datasets for algorithm evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

R package cardinalR generates high-dimensional data

Uses mathematical functions and statistical distributions

Focuses on nonlinear dimension reduction evaluation

🔎 Similar Papers

HUMAP: Hierarchical Uniform Manifold Approximation and Projection