UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional biomedical image analysis typically employs separate models for text generation and region segmentation, leading to fragmented information processing and inflexible deployment. To address this, we propose BioGrounded—the first general-purpose multimodal foundation model for biomedical imaging—unifying clinical report generation and anatomical/pathological region segmentation. Our method innovatively integrates a multimodal large language model (MLLM) with the Segment Anything Model (SAM) in a synergistic architecture, enabling end-to-end, prompt-free grounded interpretation. We curate a large-scale, million-sample biomedical image–text–mask triplet dataset covering ten imaging modalities to support multi-task joint reasoning. Evaluated across 84 internal and external datasets, BioGrounded achieves state-of-the-art performance on five core tasks: semantic segmentation, disease classification, region-level diagnosis, visual question answering, and radiology report generation—significantly enhancing both clinical analysis efficiency and result consistency.

Technology Category

Application Category

📝 Abstract
Multi-modal interpretation of biomedical images opens up novel opportunities in biomedical image analysis. Conventional AI approaches typically rely on disjointed training, i.e., Large Language Models (LLMs) for clinical text generation and segmentation models for target extraction, which results in inflexible real-world deployment and a failure to leverage holistic biomedical information. To this end, we introduce UniBiomed, the first universal foundation model for grounded biomedical image interpretation. UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM), which effectively unifies the generation of clinical texts and the segmentation of corresponding biomedical objects for grounded interpretation. In this way, UniBiomed is capable of tackling a wide range of biomedical tasks across ten diverse biomedical imaging modalities. To develop UniBiomed, we curate a large-scale dataset comprising over 27 million triplets of images, annotations, and text descriptions across ten imaging modalities. Extensive validation on 84 internal and external datasets demonstrated that UniBiomed achieves state-of-the-art performance in segmentation, disease recognition, region-aware diagnosis, visual question answering, and report generation. Moreover, unlike previous models that rely on clinical experts to pre-diagnose images and manually craft precise textual or visual prompts, UniBiomed can provide automated and end-to-end grounded interpretation for biomedical image analysis. This represents a novel paradigm shift in clinical workflows, which will significantly improve diagnostic efficiency. In summary, UniBiomed represents a novel breakthrough in biomedical AI, unlocking powerful grounded interpretation capabilities for more accurate and efficient biomedical image analysis.
Problem

Research questions and friction points this paper is trying to address.

Unifies clinical text generation and biomedical object segmentation
Addresses inflexibility in real-world biomedical image analysis
Eliminates need for manual pre-diagnosis and prompt crafting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates MLLM and SAM for unified analysis
Uses 27M image-text-annotation triplets dataset
Automates end-to-end biomedical image interpretation
L
Linshan Wu
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.
Yuxiang Nie
Yuxiang Nie
Hong Kong University of Science and Technology
Natural language processingMulti-modal LearningMedical Image Analysis
Sunan He
Sunan He
Hong Kong University of Science and Technology
Multi-Modal Learning
Jiaxin Zhuang
Jiaxin Zhuang
PhD in CSE, HKUST
Computer VisionMedical Image AnalysisArtificial Intelligence
H
Hao Chen
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.; Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.; Division of Life Science, The Hong Kong University of Science and Technology, Hong Kong, China.; State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Hong Kong, China.; Shenzhen-Hong Kong Collaborative Innovation Research In