Position: AI for Science Should Treat Measurement-to-Dataset Pipelines as Inference Components

📅 2026-05-23

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses a critical limitation in current AI4Science practices, which often treat datasets as static interfaces while neglecting the uncertainties and implicit assumptions introduced by the multi-stage processing pipeline from raw measurements to curated datasets. To remedy this, the paper proposes a “computable observation framework” that explicitly models this pipeline as an auditable and reproducible inference component, capturing its configuration, validity, and associated uncertainties. By integrating scientific workflow analysis, uncertainty quantification, and cross-dataset stability assessment, the framework enables the construction of domain-specific observation protocols. Empirical evaluation on large-scale neuroscience data reveals that only approximately 0.0004% of processing pipelines exhibit cross-dataset stability, exposing severe fragility in current practices and underscoring the framework’s essential role in uncovering hidden assumptions, validating transferability, and controlling for multiplicity.

📝 Abstract

AI for Science (AI4Science) workflows often treat the released dataset as a fixed interface to the underlying system. However, in domains relying on \emph{indirect observation}, the learner observes a derivative representation produced by multi-stage measurement, reconstruction, and preprocessing pipelines. \textbf{We argue that these measurement-to-dataset pipelines are inference components: treating their outputs as ``given data'' freezes an observation model and obscures uncertainty over feasible pipeline choices.} We identify three failure modes arising from this ``frozen lens'': \textbf{(C1) hidden hypothesis space}, where the released dataset does not specify the pipeline configuration or its validity conditions; \textbf{(C2) uncertified transportability}, where a pipeline may be documented but its regime of validity is untested, so failures under distribution shift cannot be adjudicated; \textbf{(C3) ungoverned multiplicity}, where many defensible pipelines exist and dispersion is real but not propagated into uncertainty-aware evidence. We stress-test these claims with a large-scale neuroscience empirical audit, finding a survival rate of $\approx 0.0004\%$ under a cross-dataset stability criterion. We call on the AI4Science community to make pipelines \emph{computable} inference objects via domain-specific Computable Observation Frameworks. This shift enables quantifying pipeline adequacy and stability, converting implicit implementation choices into auditable, reproducible, and cumulative scientific evidence.

Problem

Research questions and friction points this paper is trying to address.

AI for Science

measurement-to-dataset pipelines

indirect observation

uncertainty propagation

pipeline stability

Innovation

Methods, ideas, or system contributions that make the work stand out.

measurement-to-dataset pipelines

inference components

uncertainty propagation