FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings

📅 2026-04-20

📈 Citations: 0

✨ Influential: 0

career value

141K/year

🤖 AI Summary

Existing approaches struggle to effectively diagnose linguistic and modality biases in multilingual, multimodal pretrained sentence embeddings. This work proposes Factorized Linear Projection (FLiP), a method that enables interpretable analysis by directly revealing the intrinsic structure and systematic biases of embedding spaces through high-fidelity lexical reconstruction, without reliance on downstream tasks. Experiments on mainstream encoders—including LaBSE, SONAR, and Gemini—demonstrate that FLiP significantly outperforms non-factorized baselines for high- and medium-resource languages, achieving average lexical reconstruction accuracy exceeding 75%. To our knowledge, this is the first approach capable of fine-grained, cross-lingual, and cross-modal diagnosis of embedding biases.

Technology Category

Application Category

📝 Abstract

This paper presents factorized linear projection (FLiP) models for understanding pretrained sentence embedding spaces. We train FLiP models to recover the lexical content from multilingual (LaBSE), multimodal (SONAR) and API-based (Gemini) sentence embedding spaces in several high- and mid-resource languages. We show that FLiP can recall more than 75% of lexical content from the embeddings, significantly outperforming existing non-factorized baselines. Using this as a diagnostic tool, we uncover the modality and language biases across the selected sentence encoders and provide practitioners with intrinsic insights about the encoders without relying on conventional downstream evaluation tasks. Our implementation is public https://github.com/BUTSpeechFIT/FLiP.

Problem

Research questions and friction points this paper is trying to address.

multimodal

multilingual

sentence embeddings

language bias

modality bias

Innovation

Methods, ideas, or system contributions that make the work stand out.

factorized linear projection

sentence embedding analysis

multilingual embeddings