Efficient Multi-Task Inferencing: Model Merging with Gromov-Wasserstein Feature Alignment

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address storage bloat, maintenance complexity, and computational redundancy arising from multi-model deployment in automated educational assessment, this paper proposes a model merging framework grounded in the Gromov–Wasserstein (GW) distance. We pioneer the use of GW distance to quantify cross-task model compatibility via feature distribution alignment and design a lightweight modular architecture comprising a shared feature extractor and task-specific classification heads. The method preserves model generalizability while substantially reducing resource overhead: compared to the GPT-o1 baseline, it achieves statistically significant improvements in micro-F1 (p = 0.04) and single-label accuracy (p = 0.01), outperforms all baselines across all metrics, halves model storage, and incurs negligible accuracy degradation. Our core contributions are (i) a GW distance–driven cross-task alignment mechanism and (ii) an efficient modular fusion paradigm for scalable, resource-conscious model integration in educational AI systems.

Technology Category

Application Category

📝 Abstract

Automatic scoring of student responses enhances efficiency in education, but deploying a separate neural network for each task increases storage demands, maintenance efforts, and redundant computations. To address these challenges, this paper introduces the Gromov-Wasserstein Scoring Model Merging (GW-SMM) method, which merges models based on feature distribution similarities measured via the Gromov-Wasserstein distance. Our approach begins by extracting features from student responses using individual models, capturing both item-specific context and unique learned representations. The Gromov-Wasserstein distance then quantifies the similarity between these feature distributions, identifying the most compatible models for merging. Models exhibiting the smallest pairwise distances, typically in pairs or trios, are merged by combining only the shared layers preceding the classification head. This strategy results in a unified feature extractor while preserving separate classification heads for item-specific scoring. We validated our approach against human expert knowledge and a GPT-o1-based merging method. GW-SMM consistently outperformed both, achieving a higher micro F1 score, macro F1 score, exact match accuracy, and per-label accuracy. The improvements in micro F1 and per-label accuracy were statistically significant compared to GPT-o1-based merging (p=0.04, p=0.01). Additionally, GW-SMM reduced storage requirements by half without compromising much accuracy, demonstrating its computational efficiency alongside reliable scoring performance.

Problem

Research questions and friction points this paper is trying to address.

Reduces storage and computational demands in multi-task inferencing.

Merges neural models using Gromov-Wasserstein feature alignment.

Improves scoring accuracy and efficiency in educational assessments.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Merges models using Gromov-Wasserstein distance.

Combines shared layers, preserves separate classification heads.

Reduces storage by half, maintains high accuracy.

🔎 Similar Papers

No similar papers found.