🤖 AI Summary
To address the limited semantic richness and robustness of training-free sentence embeddings, this work pioneers the integration of large language models’ (LLMs) zero-shot generative capability into the training-free paradigm. Specifically, it employs semantically preserving yet diverse sentence transformations to generate augmented samples, coupled with cross-layer embedding aggregation—e.g., mean or CLS pooling—to enhance representation stability and prompt robustness. Crucially, the method bypasses conventional contrastive learning and relies solely on off-the-shelf pre-trained LLMs for both semantic augmentation and embedding optimization. Evaluated on the STS benchmark, it achieves an average improvement of +2.85 points over prior training-free approaches. Moreover, it attains state-of-the-art performance across multiple MTEB tasks—including clustering, re-ranking, and sentence-pair classification—demonstrating substantial gains over existing training-free methods.
📝 Abstract
Training-free embedding methods directly leverage pretrained large language models (LLMs) to embed text, bypassing the costly and complex procedure of contrastive learning. Previous training-free embedding methods have mainly focused on optimizing embedding prompts and have overlooked the benefits of utilizing the generative abilities of LLMs. We propose a novel method, GenEOL, which uses LLMs to generate diverse transformations of a sentence that preserve its meaning, and aggregates the resulting embeddings of these transformations to enhance the overall sentence embedding. GenEOL significantly outperforms the existing training-free embedding methods by an average of 2.85 points across several LLMs on the sentence semantic text similarity (STS) benchmark. GenEOL also achieves notable gains in clustering, reranking, and pair-classification tasks from the MTEB benchmark. Additionally, GenEOL stabilizes representation quality across LLM layers and remains robust to perturbations of embedding prompts.