Towards Scalable Proteomics: Opportunistic SMC Samplers on HTCondor

๐Ÿ“… 2025-09-09
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Quantitative proteomics urgently requires efficient Bayesian inference to uncover regulatory mechanisms and biomarkers; however, conventional Bayesian methods incur prohibitive computational costs, while existing sequential Monte Carlo (SMC) implementations rely on specialized hardware (e.g., GPUs/FPGAs), limiting affordability and scalability. This paper introduces CondorSMCโ€”the first SMC framework designed for opportunistic computing: it leverages HTCondor to orchestrate idle, heterogeneous resources and adopts a decentralized Coordinator-Manager-Follower architecture to minimize synchronization overhead and fault-tolerance cost. Crucially, it is the first to integrate opportunistic computing into SMC sampling, eliminating dependence on accelerators. Evaluated on real-world proteomics models, CondorSMC achieves near-linear scaling of sample throughput with available resources under fixed time budgets, maintains high inference accuracy, and demonstrates superior weak scaling. The implementation is open-source.

Technology Category

Application Category

๐Ÿ“ Abstract
Quantitative proteomics plays a central role in uncovering regulatory mechanisms, identifying disease biomarkers, and guiding the development of precision therapies. These insights are often obtained through complex Bayesian models, whose inference procedures are computationally intensive, especially when applied at scale to biological datasets. This limits the accessibility of advanced modelling techniques needed to fully exploit proteomics data. Although Sequential Monte Carlo (SMC) methods offer a parallelisable alternative to traditional Markov Chain Monte Carlo, their high-performance implementations often rely on specialised hardware, increasing both financial and energy costs. We address these challenges by introducing an opportunistic computing framework for SMC samplers, tailored to the demands of large-scale proteomics inference. Our approach leverages idle compute resources at the University of Liverpool via HTCondor, enabling scalable Bayesian inference without dedicated high-performance computing infrastructure. Central to this framework is a novel Coordinator-Manager-Follower architecture that reduces synchronisation overhead and supports robust operation in heterogeneous, unreliable environments. We evaluate the framework on a realistic proteomics model and show that opportunistic SMC delivers accurate inference with weak scaling, increasing samples generated under a fixed time budget as more resources join. To support adoption, we release CondorSMC, an open-source package for deploying SMC samplers in opportunistic computing environments.
Problem

Research questions and friction points this paper is trying to address.

Scalable Bayesian inference for proteomics data
Reducing computational costs of SMC methods
Enabling parallel processing without specialized hardware
Innovation

Methods, ideas, or system contributions that make the work stand out.

Opportunistic computing framework via HTCondor
Coordinator-Manager-Follower architecture reduces overhead
Open-source CondorSMC enables scalable Bayesian inference
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Matthew Carter
Department of Electrical Engineering and Electronics, University of Liverpool, United Kingdom
Lee Devlin
Lee Devlin
University of Liverpool
A
Alexander Philips
Department of Electrical Engineering and Electronics, University of Liverpool, United Kingdom
E
Edward Pyzer-Knapp
Xyme, Manchester, United Kingdom
Paul Spirakis
Paul Spirakis
Professor of Computer Science U. Liverpool and U. Patras
AlgorithmsComplexityAlgorithmic Game Theory
Simon Maskell
Simon Maskell
University of Liverpool