Do Reasoning Models Enhance Embedding Models?

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This study investigates whether reasoning models fine-tuned via Reinforcement Learning with Verifiable Rewards (RLVR) exhibit enhanced semantic representation capabilities when used as initializers for embedding models. Systematic evaluations on the MTEB and BRIGHT benchmarks reveal that RLVR does not consistently improve embedding performance. To understand this limitation, the authors propose a Hierarchical Representational Similarity Analysis (HRSA) framework, integrated with manifold geometric analysis, which demonstrates that RLVR induces only local geometric reorganization while preserving global structure. Notably, the study identifies for the first time a “manifold realignment” phenomenon caused by contrastive learning. These findings indicate that RLVR optimization remains confined within the existing semantic landscape and fails to reconstruct the underlying semantic space.

Technology Category

Application Category

📝 Abstract

State-of-the-art embedding models are increasingly derived from decoder-only Large Language Model (LLM) backbones adapted via contrastive learning. Given the emergence of reasoning models trained via Reinforcement Learning with Verifiable Rewards (RLVR), a natural question arises: do enhanced reasoning translate to superior semantic representations when these models serve as embedding initializations? Contrary to expectation, our evaluation on MTEB and BRIGHT reveals a **null effect**: embedding models initialized from RLVR-tuned backbones yield no consistent performance advantage over their base counterparts when subjected to identical training recipes. To unpack this paradox, we introduce **H**ierarchical **R**epresentation **S**imilarity **A**nalysis (HRSA), a framework that decomposes similarity across representation, geometry, and function levels. HRSA reveals that while RLVR induces irreversible latent manifold's local geometry reorganization and reversible coordinate basis drift, it preserves the global manifold geometry and linear readout. Consequently, subsequent contrastive learning drives strong alignment between base- and reasoning-initialized models, a phenomenon we term **Manifold Realignment**. Empirically, our findings suggest that unlike Supervised Fine-Tuning (SFT), RLVR optimizes trajectories within an existing semantic landscape rather than fundamentally restructuring the landscape itself.

Problem

Research questions and friction points this paper is trying to address.

reasoning models

embedding models

RLVR

semantic representations

manifold geometry

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Representation Similarity Analysis

Manifold Realignment

RLVR