๐ค AI Summary
Quantitative proteomics urgently requires efficient Bayesian inference to uncover regulatory mechanisms and biomarkers; however, conventional Bayesian methods incur prohibitive computational costs, while existing sequential Monte Carlo (SMC) implementations rely on specialized hardware (e.g., GPUs/FPGAs), limiting affordability and scalability. This paper introduces CondorSMCโthe first SMC framework designed for opportunistic computing: it leverages HTCondor to orchestrate idle, heterogeneous resources and adopts a decentralized Coordinator-Manager-Follower architecture to minimize synchronization overhead and fault-tolerance cost. Crucially, it is the first to integrate opportunistic computing into SMC sampling, eliminating dependence on accelerators. Evaluated on real-world proteomics models, CondorSMC achieves near-linear scaling of sample throughput with available resources under fixed time budgets, maintains high inference accuracy, and demonstrates superior weak scaling. The implementation is open-source.
๐ Abstract
Quantitative proteomics plays a central role in uncovering regulatory mechanisms, identifying disease biomarkers, and guiding the development of precision therapies. These insights are often obtained through complex Bayesian models, whose inference procedures are computationally intensive, especially when applied at scale to biological datasets. This limits the accessibility of advanced modelling techniques needed to fully exploit proteomics data. Although Sequential Monte Carlo (SMC) methods offer a parallelisable alternative to traditional Markov Chain Monte Carlo, their high-performance implementations often rely on specialised hardware, increasing both financial and energy costs. We address these challenges by introducing an opportunistic computing framework for SMC samplers, tailored to the demands of large-scale proteomics inference. Our approach leverages idle compute resources at the University of Liverpool via HTCondor, enabling scalable Bayesian inference without dedicated high-performance computing infrastructure. Central to this framework is a novel Coordinator-Manager-Follower architecture that reduces synchronisation overhead and supports robust operation in heterogeneous, unreliable environments. We evaluate the framework on a realistic proteomics model and show that opportunistic SMC delivers accurate inference with weak scaling, increasing samples generated under a fixed time budget as more resources join. To support adoption, we release CondorSMC, an open-source package for deploying SMC samplers in opportunistic computing environments.