🤖 AI Summary
This work addresses the challenge of constructing efficient and consistent nested (Matryoshka) embedding representations under varying computational budgets by strategically coordinating information distribution between embedding dimensions and model depth. To this end, the authors propose the MIPIC framework, which employs self-distillation-based intra-layer alignment (SIA) to ensure structural consistency across dimensions and introduces a progressive information chaining (PIC) mechanism to enable orderly transfer of deep semantic knowledge to shallower layers. MIPIC integrates top-k CKA alignment, attention-driven relational modeling, and staged semantic propagation, demonstrating compatibility with diverse architectures ranging from TinyBERT to Qwen3. Experimental results show that MIPIC significantly outperforms baseline methods on STS, NLI, and classification tasks, maintaining strong performance even at extremely low embedding dimensions, thereby validating its effectiveness and generalization capability.
📝 Abstract
Representation learning is fundamental to NLP, but building embeddings that work well at different computational budgets is challenging. Matryoshka Representation Learning (MRL) offers a flexible inference paradigm through nested embeddings; however, learning such structures requires explicit coordination of how information is arranged across embedding dimensionality and model depth. In this work, we propose MIPIC (Matryoshka Representation Learning via Self-Distilled Intra-Relational Alignment and Progressive Information Chaining), a unified training framework designed to produce structurally coherent and semantically compact Matryoshka representations. MIPIC promotes cross-dimensional structural consistency through Self-Distilled Intra-Relational Alignment (SIA), which aligns token-level geometric and attention-driven relations between full and truncated representations using top-k CKA self-distillation. Complementarily, it enables depth-wise semantic consolidation via Progressive Information Chaining (PIC), a scaffolded alignment strategy that incrementally transfers mature task semantics from deeper layers into earlier layers. Extensive experiments on STS, NLI, and classification benchmarks (spanning models from TinyBERT to BGEM3, Qwen3) demonstrate that MIPIC yields Matryoshka representations that are highly competitive across all capacities, with significant performance advantages observed under extreme low-dimensional.