🤖 AI Summary
This study investigates whether Matryoshka Representation Learning (MRL) is truly necessary for text embeddings to maintain effectiveness across varying truncation lengths. By systematically comparing MRL-trained and non-MRL-trained pretrained text encoders under identical truncation strategies across diverse downstream tasks, the authors find that non-MRL models achieve comparable or even superior performance to MRL models under light-to-moderate truncation (≤80% of the original sequence length). MRL demonstrates a clear advantage only under severe truncation. These findings challenge the prevailing assumption that MRL is essential for robustness to embedding truncation, suggesting instead that such robustness may stem from inherent properties of the underlying models rather than the MRL architecture itself.
📝 Abstract
Matryoshka Representation Learning (MRL) is a widely adopted approach for training text encoders so they provide useful text representations at various sizes, available by simply truncating the resulting vectors at sizes pre-determined at training time. Recent works have shown that randomly truncating text embeddings has minimal impact in downstream performance unless vectors are reduced in size by at least 70%, suggesting that embeddings are already robust to truncation without the use of MRL. However, no prior work has compared random truncation to MRL, so it is unclear how the two methods compare as effective embedding reduction methods. In this paper, we study this by applying the same truncation used by MRL to models trained with and without MRL. Our results across several models and downstream tasks show that, unless heavily truncating embeddings (i.e. reducing their size by at least 80%), truncated embeddings of non-MRL models are competitive with, and often outperform models trained with MRL. This suggests that truncation robustness may not necessarily come from MRL, and that the choice of spending the additional training cost of MRL depends on whether heavy truncation is desired.