🤖 AI Summary
This work addresses the challenge of representation conflict in low-resource multilingual speech translation caused by uniform cross-lingual parameter sharing. To mitigate this limitation, the authors propose a fine-grained sharing strategy guided by training gradient analysis, which automatically determines language-specific sharing patterns across model layers through a three-tier mechanism: first, languages are clustered based on gradient distances; second, model capacity is dynamically allocated according to intra- and inter-task gradient divergence; and third, subspace alignment is achieved via joint factorization coupled with canonical correlation analysis. Evaluated on the SeamlessM4T-Medium architecture across four language pairs, the approach yields significant improvements in translation quality, demonstrating the effectiveness and generalizability of gradient-driven parameter sharing in multilingual speech translation.
📝 Abstract
In low-resource multilingual speech-to-text translation, uniform architectural sharing across languages frequently introduces representation conflicts that impede convergence. This work proposes a principled methodology to automatically determine layer-specific sharing patterns by mining training gradient information. Our approach employs three distinct analysis strategies: distance-based language clustering, self/cross-task divergence metrics for capacity allocation, and joint factorization coupled with canonical correlation analysis for subspace alignment. Extensive evaluation across four language pairs (using the SeamlessM4T-Medium architecture) demonstrates persistent improvements in translation quality metrics.