RamanBench: A Large-Scale Benchmark for Machine Learning on Raman Spectroscopy

📅 2026-05-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
This work addresses the longstanding absence of standardized machine learning benchmarks in Raman spectroscopy, which has led to fragmented datasets and inconsistent evaluation practices. To bridge this gap, the authors introduce RamBench—the first large-scale, reproducible benchmark for Raman spectroscopy—comprising 74 datasets and 325,668 spectra across four representative application scenarios. RamBench provides a unified data interface, standardized evaluation protocols, and a live online leaderboard. A systematic evaluation of 28 algorithms—including PLS, RamanNet, TabPFN, and ROCKET—reveals that tabular foundation models generally outperform both domain-specific and gradient-boosting approaches. However, no model demonstrates robust cross-dataset generalization, highlighting a fundamental limitation of current methodologies in the field.
📝 Abstract
Machine Learning (ML) has transformed many scientific fields, yet key applications still lack standardized benchmarks. Raman spectroscopy, a widely used technique for non-invasive molecular analysis, is one such field where progress is limited by fragmented datasets, inconsistent evaluation, and models that fail to capture the structure of spectral data. We introduce RamanBench, the first large-scale, fully reproducible benchmark for ML on Raman spectroscopy, consisting of streamlined data access, evaluation protocols and code, as well as a live leaderboard. It unifies 74 datasets (including 16 first released with this benchmark) across four domains, comprising 325,668 spectra and spanning classification and regression tasks under diverse experimental conditions. We benchmark 28 models under a standardized protocol, including classical methods (e.g., PLS), Raman-specific (e.g., RamanNet), Tabular Foundation Model (TFM) (e.g., TabPFN), and time-series approaches (e.g., ROCKET). TFM consistently outperform domain-specific and gradient boosting baselines, while time-series models remain competitive. However, no method generalizes across datasets, revealing a fundamental gap. Therefore, we invite the community to contribute new approaches to our living benchmark, with the potential to accelerate advances in critical applications such as medical diagnostics, biological research, and materials science.
Problem

Research questions and friction points this paper is trying to address.

Raman spectroscopy
machine learning benchmark
standardized evaluation
spectral data
dataset fragmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Raman spectroscopy
machine learning benchmark
Tabular Foundation Model
reproducible evaluation
spectral data
🔎 Similar Papers
No similar papers found.