MemReranker: Reasoning-Aware Reranking for Agent Memory Retrieval

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the limitations of traditional reranking models, which rely primarily on semantic matching and lack sophisticated reasoning capabilities such as temporal constraint modeling, causal inference, and coreference resolution, while also suffering from poorly calibrated relevance scores. To overcome these issues, the authors propose the MemReranker series (0.6B/4B), built upon Qwen3-Reranker and trained via multi-stage knowledge distillation that integrates general corpora with multi-turn dialogue data from memory-intensive scenarios. The approach innovatively combines multi-teacher pairwise ranking, pointwise binary cross-entropy distillation, and InfoNCE-based contrastive learning to generate well-calibrated soft labels and enhance discrimination on hard negatives. Experiments show that MemReranker-0.6B significantly outperforms BGE-Reranker and rivals open-source 4B/8B models and GPT-4o-mini, while MemReranker-4B achieves a MAP of 0.737—comparable to Gemini-1.5-Flash—with only 10–20% of the inference latency and strong generalization in finance and healthcare domains.

📝 Abstract

In agent memory systems, the reranking model serves as the critical bridge connecting user queries with long-term memory. Most systems adopt the "retrieve-then-rerank" two-stage paradigm, but generic reranking models rely on semantic similarity matching and lack genuine reasoning capabilities, leading to a problem where recalled results are semantically highly relevant yet do not contain the key information needed to answer the question. This deficiency manifests in memory scenarios as three specific problems. First, relevance scores are miscalibrated, making threshold-based filtering difficult. Second, ranking degrades when facing temporal constraints, causal reasoning, and other complex queries. Third, the model cannot leverage dialogue context for semantic disambiguation. This report introduces MemReranker, a reranking model family (0.6B/4B) built on Qwen3-Reranker through multi-stage LLM knowledge distillation. Multi-teacher pairwise comparisons generate calibrated soft labels, BCE pointwise distillation establishes well-distributed scores, and InfoNCE contrastive learning enhances hard-sample discrimination. Training data combines general corpora with memory-specific multi-turn dialogue data covering temporal constraints, causal reasoning, and coreference resolution. On the memory retrieval benchmark, MemReranker-0.6B substantially outperforms BGE-Reranker and matches open-source 4B/8B models as well as GPT-4o-mini on key metrics. MemReranker-4B further achieves 0.737 MAP, with several metrics on par with Gemini-3-Flash, while maintaining inference latency at only 10--20\% of large models. On finance and healthcare vertical-domain benchmarks, the models preserve generalization capabilities on par with mainstream large-parameter rerankers.

Problem

Research questions and friction points this paper is trying to address.

reranking

reasoning

agent memory

semantic disambiguation

temporal constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

reasoning-aware reranking

knowledge distillation

memory retrieval