🤖 AI Summary
Soft sensor modeling for cell culture batch processes faces challenges due to limited historical data, sparse feedback, heterogeneous operating conditions, and high-dimensional sensor signals. Method: This study systematically compares batch learning, online learning, and just-in-time learning under cold-start and multi-condition scenarios; proposes a multi-source data fusion strategy integrating real-time Raman spectra with delayed offline measurements; and incorporates feature dimensionality reduction with meta-feature analysis (e.g., feed composition, control strategy) to identify key factors governing model transferability. Contribution/Results: Batch learning performs well under homogeneous conditions, whereas just-in-time and online learning improve prediction accuracy of critical process variables by 23–41% under heterogeneous or cold-start conditions. Multi-source fusion further enhances monitoring robustness and generalization capability, demonstrating superior adaptability in low-data, dynamic bioprocess environments.
📝 Abstract
In cell culture bioprocessing, real-time batch process monitoring (BPM) refers to the continuous tracking and analysis of key process variables such as viable cell density, nutrient levels, metabolite concentrations, and product titer throughout the duration of a batch run. This enables early detection of deviations and supports timely control actions to ensure optimal cell growth and product quality. BPM plays a critical role in ensuring the quality and regulatory compliance of biopharmaceutical manufacturing processes. However, the development of accurate soft sensors for BPM is hindered by key challenges, including limited historical data, infrequent feedback, heterogeneous process conditions, and high-dimensional sensory inputs. This study presents a comprehensive benchmarking analysis of machine learning (ML) methods designed to address these challenges, with a focus on learning from historical data with limited volume and relevance in the context of bioprocess monitoring. We evaluate multiple ML approaches including feature dimensionality reduction, online learning, and just-in-time learning across three datasets, one in silico dataset and two real-world experimental datasets. Our findings highlight the importance of training strategies in handling limited data and feedback, with batch learning proving effective in homogeneous settings, while just-in-time learning and online learning demonstrate superior adaptability in cold-start scenarios. Additionally, we identify key meta-features, such as feed media composition and process control strategies, that significantly impact model transferability. The results also suggest that integrating Raman-based predictions with lagged offline measurements enhances monitoring accuracy, offering a promising direction for future bioprocess soft sensor development.