U-Bench: A Comprehensive Understanding of U-Net through 100-Variant Benchmarking

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Current medical image segmentation lacks a systematic, reproducible benchmark for evaluating U-Net variants. To address this, we introduce the first large-scale, statistically rigorous U-shaped network benchmark: it systematically evaluates 100 U-Net variants across 28 datasets and 10 imaging modalities, assessing performance, robustness, zero-shot generalization, and computational efficiency. We propose the U-Score—a unified metric quantifying the performance-efficiency trade-off—and release a model recommendation agent to facilitate task-driven architecture selection. The complete framework—including code, trained models, and evaluation protocols—is open-sourced. Experiments expose critical limitations of existing evaluation practices, particularly poor cross-modal stability. Our benchmark empirically demonstrates strong utility for fair model comparison, algorithmic diagnosis, and community-driven extension, thereby establishing a foundational resource for advancing architectural research in medical image segmentation.

Technology Category

Application Category

📝 Abstract

Over the past decade, U-Net has been the dominant architecture in medical image segmentation, leading to the development of thousands of U-shaped variants. Despite its widespread adoption, there is still no comprehensive benchmark to systematically evaluate their performance and utility, largely because of insufficient statistical validation and limited consideration of efficiency and generalization across diverse datasets. To bridge this gap, we present U-Bench, the first large-scale, statistically rigorous benchmark that evaluates 100 U-Net variants across 28 datasets and 10 imaging modalities. Our contributions are threefold: (1) Comprehensive Evaluation: U-Bench evaluates models along three key dimensions: statistical robustness, zero-shot generalization, and computational efficiency. We introduce a novel metric, U-Score, which jointly captures the performance-efficiency trade-off, offering a deployment-oriented perspective on model progress. (2) Systematic Analysis and Model Selection Guidance: We summarize key findings from the large-scale evaluation and systematically analyze the impact of dataset characteristics and architectural paradigms on model performance. Based on these insights, we propose a model advisor agent to guide researchers in selecting the most suitable models for specific datasets and tasks. (3) Public Availability: We provide all code, models, protocols, and weights, enabling the community to reproduce our results and extend the benchmark with future methods. In summary, U-Bench not only exposes gaps in previous evaluations but also establishes a foundation for fair, reproducible, and practically relevant benchmarking in the next decade of U-Net-based segmentation models. The project can be accessed at: https://fenghetan9.github.io/ubench. Code is available at: https://github.com/FengheTan9/U-Bench.

Problem

Research questions and friction points this paper is trying to address.

Lack of comprehensive benchmark for U-Net medical segmentation variants

Insufficient statistical validation and efficiency generalization evaluation

Need systematic performance analysis across diverse datasets modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale benchmark evaluates 100 U-Net variants

Introduces U-Score metric for performance-efficiency trade-off

Provides model advisor agent for optimal model selection

🔎 Similar Papers

No similar papers found.