HEISIR: Hierarchical Expansion of Inverted Semantic Indexing for Training-free Retrieval of Conversational Data using LLMs

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing conversational retrieval methods rely heavily on large-scale annotation and fine-tuning, compromising the trade-off between semantic modeling accuracy and deployment efficiency. To address this, we propose a zero-shot, zero-annotation hierarchical inverted semantic indexing framework. It extracts SVOA (Subject–Verb–Object–Adjunct) quatruples via dependency parsing and semantic role labeling, and explicitly models temporal, intentional, and contextual adjunct semantics through an Adjunct Augmentation mechanism. Furthermore, we design a plug-and-play retrieval architecture compatible with embeddings from multiple large language models (LLMs), requiring no training. Experiments on multi-turn conversational retrieval show an average 12.7% improvement in Recall@10 and a 68% reduction in latency. The framework has been successfully deployed in three industrial dialogue systems, supporting intent recognition and topic evolution analysis.

Technology Category

Application Category

📝 Abstract

The growth of conversational AI services has increased demand for effective information retrieval from dialogue data. However, existing methods often face challenges in capturing semantic intent or require extensive labeling and fine-tuning. This paper introduces HEISIR (Hierarchical Expansion of Inverted Semantic Indexing for Retrieval), a novel framework that enhances semantic understanding in conversational data retrieval through optimized data ingestion, eliminating the need for resource-intensive labeling or model adaptation. HEISIR implements a two-step process: (1) Hierarchical Triplets Formulation and (2) Adjunct Augmentation, creating semantic indices consisting of Subject-Verb-Object-Adjunct (SVOA) quadruplets. This structured representation effectively captures the underlying semantic information from dialogue content. HEISIR achieves high retrieval performance while maintaining low latency during the actual retrieval process. Our experimental results demonstrate that HEISIR outperforms fine-tuned models across various embedding types and language models. Beyond improving retrieval capabilities, HEISIR also offers opportunities for intent and topic analysis in conversational data, providing a versatile solution for dialogue systems.

Problem

Research questions and friction points this paper is trying to address.

Enhances semantic understanding in conversational data retrieval

Eliminates need for resource-intensive labeling or model adaptation

Improves retrieval performance and reduces latency in dialogue systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Triplets Formulation for semantic indexing

Adjunct Augmentation enhances SVOA quadruplets representation

Optimized data ingestion eliminates need for labeling

🔎 Similar Papers

No similar papers found.