HiNS: Hierarchical Negative Sampling for More Comprehensive Memory Retrieval Embedding Model

📅 2026-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing memory retrieval embedding models: their training procedures neglect the hierarchical difficulty of negative samples and their natural distribution in human–machine conversations, thereby constraining discriminative performance. To overcome this, we propose HiNS (Hierarchical Negative Sampling), a novel framework that systematically models both the difficulty tiers of negative samples and their empirical proportions in real dialogues, enabling a more realistic and effective negative sampling strategy aligned with actual interaction scenarios. Extensive experiments demonstrate that HiNS substantially enhances the fine-grained discriminative capacity of embedding models, achieving significant improvements on the LoCoMo and PERSONAMEM benchmarks—with gains of up to 3.27% in F1 score, 3.30% in BLEU-1, and 2.55% in overall score.

Technology Category

Application Category

📝 Abstract
Memory-augmented language agents rely on embedding models for effective memory retrieval. However, existing training data construction overlooks a critical limitation: the hierarchical difficulty of negative samples and their natural distribution in human-agent interactions. In practice, some negatives are semantically close distractors while others are trivially irrelevant, and natural dialogue exhibits structured proportions of these types. Current approaches using synthetic or uniformly sampled negatives fail to reflect this diversity, limiting embedding models'ability to learn nuanced discrimination essential for robust memory retrieval. In this work, we propose a principled data construction framework HiNS that explicitly models negative sample difficulty tiers and incorporates empirically grounded negative ratios derived from conversational data, enabling the training of embedding models with substantially improved retrieval fidelity and generalization in memory-intensive tasks. Experiments show significant improvements: on LoCoMo, F1/BLEU-1 gains of 3.27%/3.30%(MemoryOS) and 1.95%/1.78% (Mem0); on PERSONAMEM, total score improvements of 1.19% (MemoryOS) and 2.55% (Mem0).
Problem

Research questions and friction points this paper is trying to address.

negative sampling
memory retrieval
embedding model
hierarchical difficulty
conversational data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Negative Sampling
Memory Retrieval
Embedding Model
Negative Sample Difficulty
Conversational Data
🔎 Similar Papers
No similar papers found.