🤖 AI Summary
Existing biomedical image processing plugins are typically task- or dataset-specific, exhibiting poor generalization and limited versatility, thus failing to meet biologists’ diverse analytical needs. Method: We propose Task-related Joint Embedding Pre-training (TJP), a novel framework that abandons masked modeling and instead integrates stochastic multi-scale sampling with a hierarchical Mamba architecture—including Multi-head Hierarchy Mamba—to enable strong cross-task and cross-dataset generalization. Leveraging Mamba’s linear-complexity sequence modeling, we further design a three-stage fine-tuning strategy (Full/Normal/Light) to balance efficiency and performance. Results: On downstream tasks—including image registration, fusion, denoising, and super-resolution—TJP matches or surpasses state-of-the-art task-specific models. Even under lightweight fine-tuning, it maintains superior performance, significantly reducing model selection overhead. To our knowledge, this work establishes the first efficient, general-purpose platform for low-level biomedical image processing.
📝 Abstract
Deep learning has emerged as a pivotal tool for accelerating research in the life sciences, with the low-level processing of biomedical images (e.g., registration, fusion, restoration, super-resolution) being one of its most critical applications. Platforms such as ImageJ (Fiji) and napari have enabled the development of customized plugins for various models. However, these plugins are typically based on models that are limited to specific tasks and datasets, making them less practical for biologists. To address this challenge, we introduce Orochi, the first application-oriented, efficient, and versatile image processor designed to overcome these limitations. Orochi is pre-trained on patches/volumes extracted from the raw data of over 100 publicly available studies using our Random Multi-scale Sampling strategy. We further propose Task-related Joint-embedding Pre-Training (TJP), which employs biomedical task-related degradation for self-supervision rather than relying on Masked Image Modelling (MIM), which performs poorly in downstream tasks such as registration. To ensure computational efficiency, we leverage Mamba's linear computational complexity and construct Multi-head Hierarchy Mamba. Additionally, we provide a three-tier fine-tuning framework (Full, Normal, and Light) and demonstrate that Orochi achieves comparable or superior performance to current state-of-the-art specialist models, even with lightweight parameter-efficient options. We hope that our study contributes to the development of an all-in-one workflow, thereby relieving biologists from the overwhelming task of selecting among numerous models.