Measurement noise scaling laws for cellular representation learning

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the mechanistic impact of measurement noise—particularly molecular undersampling noise prevalent in single-cell genomics—on representation learning performance. Method: We introduce the “noise scaling axis,” conceptualizing measurement noise as a third independent performance-controlling variable, orthogonal to model and data scale. Leveraging information-theoretic principles, we formulate a principled representation quality metric and empirically validate our framework via Gaussian noise modeling, single-cell multi-omics experiments, and cross-modal evaluation (e.g., image noise). Contribution/Results: We discover that the noise–performance relationship universally follows a logarithmic scaling law, robust across diverse architectures, datasets, and modalities. This law provides a quantifiable theoretical foundation for single-cell data generation, quality control, and robust algorithm design.

Technology Category

Application Category

📝 Abstract
Deep learning scaling laws predict how performance improves with increased model and dataset size. Here we identify measurement noise in data as another performance scaling axis, governed by a distinct logarithmic law. We focus on representation learning models of biological single cell genomic data, where a dominant source of measurement noise is due to molecular undersampling. We introduce an information-theoretic metric for cellular representation model quality, and find that it scales with sampling depth. A single quantitative relationship holds across several model types and across several datasets. We show that the analytical form of this relationship can be derived from a simple Gaussian noise model, which in turn provides an intuitive interpretation for the scaling law. Finally, we show that the same relationship emerges in image classification models with respect to two types of imaging noise, suggesting that measurement noise scaling may be a general phenomenon. Scaling with noise can serve as a guide in generating and curating data for deep learning models, particularly in fields where measurement quality can vary dramatically between datasets.
Problem

Research questions and friction points this paper is trying to address.

Identifies measurement noise as a key factor in model performance scaling.
Develops a metric for cellular representation quality linked to sampling depth.
Demonstrates noise scaling laws apply across different models and datasets.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifies measurement noise as performance scaling axis
Introduces information-theoretic metric for model quality
Derives scaling law from Gaussian noise model
🔎 Similar Papers
No similar papers found.
G
G. Gowri
Department of Systems Biology, Harvard University
Peng Yin
Peng Yin
Professor of Systems Biology
DNA/RNA nanotechnologysynthetic biologymolecular programming
A
Allon M. Klein
Department of Systems Biology, Harvard University