🤖 AI Summary
Model conversion across deep learning frameworks frequently fails or suffers severe accuracy degradation due to inconsistencies in input formats, parameters, hyperparameters, and computational graphs. To address this, we propose the first fine-grained fault taxonomy specifically designed for model conversion, enabling end-to-end iterative fault localization and repair via per-sample output comparison between source and target models. Our method integrates an empirically derived fault pattern repository, cross-framework behavioral consistency verification, parameter- and architecture-level targeted replacement, and iterative calibration. Evaluated on three image recognition models converted across four major frameworks (12 conversion paths total), our approach successfully repaired 462 out of 755 identified faults. In 15 representative failure cases, 14 achieved full or substantial accuracy recovery. The results demonstrate significant improvements in post-conversion model deployability and functional correctness.
📝 Abstract
Converting deep learning models between frameworks is a common step to maximize model compatibility across devices and leverage optimization features that may be exclusively provided in one deep learning framework. However, this conversion process may be riddled with bugs, making the converted models either undeployable or problematic, considerably degrading their prediction correctness. In this paper, we propose an automated approach for fault localization and repair, FetaFix, during model conversion between deep learning frameworks. FetaFix is capable of detecting and fixing faults introduced in model input, parameters, hyperparameters, and the model graph during conversion. FetaFix uses a set of fault types (mined from surveying common conversion issues reported in code repositories and forums) to localize potential conversion faults in the converted target model and then repair them appropriately, e.g., replacing the parameters of the target model with those from the source model. This is done iteratively for every image in the dataset, comparing output label differences between the source model and the converted target model until all differences are resolved. We evaluate the effectiveness of FetaFix in fixing model conversion bugs of three widely used image recognition models converted across four different deep learning frameworks. Overall, FetaFix was able to fix $462$ out of $755$ detected conversion faults, either completely repairing or significantly improving the performance of $14$ out of the $15$ erroneous conversion cases.