HiNS: Hierarchical Negative Sampling for More Comprehensive Memory Retrieval Embedding Model

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This work addresses a critical limitation in existing memory retrieval embedding models: their training procedures neglect the hierarchical difficulty of negative samples and their natural distribution in human–machine conversations, thereby constraining discriminative performance. To overcome this, we propose HiNS (Hierarchical Negative Sampling), a novel framework that systematically models both the difficulty tiers of negative samples and their empirical proportions in real dialogues, enabling a more realistic and effective negative sampling strategy aligned with actual interaction scenarios. Extensive experiments demonstrate that HiNS substantially enhances the fine-grained discriminative capacity of embedding models, achieving significant improvements on the LoCoMo and PERSONAMEM benchmarks—with gains of up to 3.27% in F1 score, 3.30% in BLEU-1, and 2.55% in overall score.

Technology Category

Application Category

📝 Abstract

Memory-augmented language agents rely on embedding models for effective memory retrieval. However, existing training data construction overlooks a critical limitation: the hierarchical difficulty of negative samples and their natural distribution in human-agent interactions. In practice, some negatives are semantically close distractors while others are trivially irrelevant, and natural dialogue exhibits structured proportions of these types. Current approaches using synthetic or uniformly sampled negatives fail to reflect this diversity, limiting embedding models'ability to learn nuanced discrimination essential for robust memory retrieval. In this work, we propose a principled data construction framework HiNS that explicitly models negative sample difficulty tiers and incorporates empirically grounded negative ratios derived from conversational data, enabling the training of embedding models with substantially improved retrieval fidelity and generalization in memory-intensive tasks. Experiments show significant improvements: on LoCoMo, F1/BLEU-1 gains of 3.27%/3.30%(MemoryOS) and 1.95%/1.78% (Mem0); on PERSONAMEM, total score improvements of 1.19% (MemoryOS) and 2.55% (Mem0).

Problem

Research questions and friction points this paper is trying to address.

negative sampling

memory retrieval

embedding model

hierarchical difficulty

conversational data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Negative Sampling

Memory Retrieval

Embedding Model