🤖 AI Summary
In deformable image registration (DIR), a fundamental trade-off exists between registration accuracy and deformation regularity, which conventional single-point evaluation metrics fail to characterize globally. To address this, we propose the Alignment–Regularity Characteristic (ARC) evaluation framework—the first method to continuously quantify this trade-off via an ARC curve. Our approach employs a HyperNetwork to enable efficient, differentiable interpolation across the full spectrum of regularization strengths. It integrates differentiable deformation regularization modeling with multi-metric analysis—including DICE score, Jacobian determinant statistics, and folding rate—to jointly assess alignment quality and deformation plausibility. Extensive experiments on two public benchmarks demonstrate that ARC reveals critical performance inflection points and inherent model preferences overlooked by traditional evaluation protocols. The framework provides deeper insights into DIR model behavior and facilitates principled hyperparameter selection. Our implementation is publicly available.
📝 Abstract
Evaluating deformable image registration (DIR) is challenging due to the inherent trade-off between achieving high alignment accuracy and maintaining deformation regularity. In this work, we introduce a novel evaluation scheme based on the alignment-regularity characteristic (ARC) to systematically capture and analyze this trade-off. We first introduce the ARC curves, which describe the performance of a given registration algorithm as a spectrum measured by alignment and regularity metrics. We further adopt a HyperNetwork-based approach that learns to continuously interpolate across the full regularization range, accelerating the construction and improving the sample density of ARC curves. We empirically demonstrate our evaluation scheme using representative learning-based deformable image registration methods with various network architectures and transformation models on two public datasets. We present a range of findings not evident from existing evaluation practices and provide general recommendations for model evaluation and selection using our evaluation scheme. All code relevant is made publicly available.