Self-supervised and Multi-fidelity Learning for Extended Predictive Soil Spectroscopy

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Near-infrared (NIR) soil spectral libraries are typically small and suffer from poor generalizability, whereas mid-infrared (MIR) spectra are abundant but cannot be directly leveraged for low-cost NIR-based soil property prediction. Method: We propose a self-supervised multi-fidelity learning framework based on a variational autoencoder (VAE), jointly modeling NIR and MIR spectra in a shared latent space. It performs unsupervised pretraining on large-scale unlabeled spectral data, fine-tunes the encoder using limited paired NIR–MIR samples, and freezes the pretrained decoder to enable cross-band mapping—transferring high-fidelity MIR knowledge to NIR prediction. Finally, regression models link the learned spectral embeddings to nine soil properties. Contribution/Results: All property predictions outperform baseline methods; critically, MIR-guided NIR prediction achieves substantial accuracy gains, effectively alleviating the data scarcity bottleneck inherent to NIR spectroscopy.

Technology Category

Application Category

📝 Abstract
We propose a self-supervised machine learning (SSML) framework for multi-fidelity learning and extended predictive soil spectroscopy based on latent space embeddings. A self-supervised representation was pretrained with the large MIR spectral library and the Variational Autoencoder algorithm to obtain a compressed latent space for generating spectral embeddings. At this stage, only unlabeled spectral data were used, allowing us to leverage the full spectral database and the availability of scan repeats for augmented training. We also leveraged and froze the trained MIR decoder for a spectrum conversion task by plugging it into a NIR encoder to learn the mapping between NIR and MIR spectra in an attempt to leverage the predictive capabilities contained in the large MIR library with a low cost portable NIR scanner. This was achieved by using a smaller subset of the KSSL library with paired NIR and MIR spectra. Downstream machine learning models were then trained to map between original spectra, predicted spectra, and latent space embeddings for nine soil properties. The performance of was evaluated independently of the KSSL training data using a gold-standard test set, along with regression goodness-of-fit metrics. Compared to baseline models, the proposed SSML and its embeddings yielded similar or better accuracy in all soil properties prediction tasks. Predictions derived from the spectrum conversion (NIR to MIR) task did not match the performance of the original MIR spectra but were similar or superior to predictive performance of NIR-only models, suggesting the unified spectral latent space can effectively leverage the larger and more diverse MIR dataset for prediction of soil properties not well represented in current NIR libraries.
Problem

Research questions and friction points this paper is trying to address.

Develop self-supervised learning for soil spectroscopy using latent embeddings
Convert NIR to MIR spectra to leverage large MIR library predictions
Predict nine soil properties with improved accuracy using spectral data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised framework for multi-fidelity soil spectroscopy
Variational Autoencoder generates latent spectral embeddings
NIR to MIR spectrum conversion leveraging frozen decoder
🔎 Similar Papers
No similar papers found.
Luning Sun
Luning Sun
Lawrence Livermore National Lab
AI for ScienceScientific Machine LearningUncertainty QuantificationCFDVariational Inference
J
José L. Safanelli
Woodwell Climate Research Center, Falmouth, MA, USA
Jonathan Sanderman
Jonathan Sanderman
Woodwell Climate Research Center, Falmouth, MA, USA
K
Katerina Georgiou
Oregon State University, Corvallis, OR, USA
C
Colby Brungard
New Mexico State University, Las Cruces, NM, USA
K
Kanchan Grover
New Mexico State University, Las Cruces, NM, USA
B
Bryan G. Hopkins
Soil Science Society of America — North American Proficiency Testing Program, and Brigham Young University, Provo, UT, USA
S
Shusen Liu
Lawrence Livermore National Laboratory, Livermore, CA, USA
T
Timo Bremer
Lawrence Livermore National Laboratory, Livermore, CA, USA