PCS Workflow for Veridical Data Science in the Age of AI

📅 2025-06-18
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Widespread irreproducibility in data science stems from uncertainty introduced by human judgment throughout the Data Science Life Cycle (DSLC), which conventional statistical methods cannot adequately quantify. To address this, we propose the PCS-Workflow—a streamlined analytical framework grounded in the Prediction, Computation, and Stability (PCS) paradigm—that uniquely integrates generative AI across the entire DSLC to enable guided, verifiable data analysis. Our work innovatively identifies and quantifies the propagation of uncertainty from subjective decisions—particularly in critical stages such as data cleaning—to downstream predictive performance. Empirical evaluation across multiple case studies demonstrates that the PCS-Workflow significantly improves analytical transparency and result stability, enables explicit uncertainty modeling, and enhances result credibility. This provides a practical, deployable paradigm for reproducible data science in the AI era.

Technology Category

Application Category

📝 Abstract
Data science is a pillar of artificial intelligence (AI), which is transforming nearly every domain of human activity, from the social and physical sciences to engineering and medicine. While data-driven findings in AI offer unprecedented power to extract insights and guide decision-making, many are difficult or impossible to replicate. A key reason for this challenge is the uncertainty introduced by the many choices made throughout the data science life cycle (DSLC). Traditional statistical frameworks often fail to account for this uncertainty. The Predictability-Computability-Stability (PCS) framework for veridical (truthful) data science offers a principled approach to addressing this challenge throughout the DSLC. This paper presents an updated and streamlined PCS workflow, tailored for practitioners and enhanced with guided use of generative AI. We include a running example to display the PCS framework in action, and conduct a related case study which showcases the uncertainty in downstream predictions caused by judgment calls in the data cleaning stage.
Problem

Research questions and friction points this paper is trying to address.

Addresses uncertainty in data science lifecycle choices
Enhances replicability of AI-driven data science findings
Integrates generative AI for guided PCS workflow application
Innovation

Methods, ideas, or system contributions that make the work stand out.

PCS framework ensures veridical data science
Streamlined workflow integrates generative AI guidance
Addresses uncertainty from data cleaning decisions
🔎 Similar Papers
No similar papers found.