FLiP: Towards understanding and interpreting multimodal multilingual sentence embeddings

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

176K/year
🤖 AI Summary
Existing approaches struggle to effectively diagnose linguistic and modality biases in multilingual, multimodal pretrained sentence embeddings. This work proposes Factorized Linear Projection (FLiP), a method that enables interpretable analysis by directly revealing the intrinsic structure and systematic biases of embedding spaces through high-fidelity lexical reconstruction, without reliance on downstream tasks. Experiments on mainstream encoders—including LaBSE, SONAR, and Gemini—demonstrate that FLiP significantly outperforms non-factorized baselines for high- and medium-resource languages, achieving average lexical reconstruction accuracy exceeding 75%. To our knowledge, this is the first approach capable of fine-grained, cross-lingual, and cross-modal diagnosis of embedding biases.

Technology Category

Application Category

📝 Abstract
This paper presents factorized linear projection (FLiP) models for understanding pretrained sentence embedding spaces. We train FLiP models to recover the lexical content from multilingual (LaBSE), multimodal (SONAR) and API-based (Gemini) sentence embedding spaces in several high- and mid-resource languages. We show that FLiP can recall more than 75% of lexical content from the embeddings, significantly outperforming existing non-factorized baselines. Using this as a diagnostic tool, we uncover the modality and language biases across the selected sentence encoders and provide practitioners with intrinsic insights about the encoders without relying on conventional downstream evaluation tasks. Our implementation is public https://github.com/BUTSpeechFIT/FLiP.
Problem

Research questions and friction points this paper is trying to address.

multimodal
multilingual
sentence embeddings
language bias
modality bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

factorized linear projection
sentence embedding analysis
multilingual embeddings
multimodal embeddings
embedding interpretability
🔎 Similar Papers
No similar papers found.