Why These Documents? Explainable Generative Retrieval with Hierarchical Category Paths

📅 2024-11-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing generative retrieval methods directly decode document IDs without providing interpretable justifications for why a particular document is retrieved. Method: We propose HyPE, the first framework to employ hierarchical category paths as an interpretability carrier: given a query, HyPE progressively generates a semantically grounded path—from broad to fine-grained categories—and then decodes the target document ID conditioned on this path, enabling semantic attribution. HyPE supports query-adaptive, diverse yet structurally consistent explanations for the same document and incorporates path-aware re-ranking to improve retrieval accuracy. Contribution/Results: By integrating external semantic hierarchies, LLM-driven path candidate generation, path-augmented fine-tuning, and path-aware re-ranking, HyPE achieves significant gains in retrieval accuracy across multiple benchmarks. Crucially, it delivers human-understandable, semantically coherent, and fine-grained controllable explanations—effectively bridging performance and interpretability in generative retrieval.

Technology Category

Application Category

📝 Abstract

Generative retrieval has recently emerged as a new alternative of traditional information retrieval approaches. However, existing generative retrieval methods directly decode docid when a query is given, making it impossible to provide users with explanations as an answer for"Why this document is retrieved?". To address this limitation, we propose Hierarchical Category Path-Enhanced Generative Retrieval(HyPE), which enhances explainability by generating hierarchical category paths step-by-step before decoding docid. HyPE leverages hierarchical category paths as explanation, progressing from broad to specific semantic categories. This approach enables diverse explanations for the same document depending on the query by using shared category paths between the query and the document, and provides reasonable explanation by reflecting the document's semantic structure through a coarse-to-fine manner. HyPE constructs category paths with external high-quality semantic hierarchy, leverages LLM to select appropriate candidate paths for each document, and optimizes the generative retrieval model with path-augmented dataset. During inference, HyPE utilizes path-aware reranking strategy to aggregate diverse topic information, allowing the most relevant documents to be prioritized in the final ranked list of docids. Our extensive experiments demonstrate that HyPE not only offers a high level of explainability but also improves the retrieval performance in the document retrieval task.

Problem

Research questions and friction points this paper is trying to address.

Enhancing explainability in generative document retrieval

Generating hierarchical category paths before document decoding

Improving retrieval performance with path-aware reranking strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates hierarchical category paths for explainability

Uses LLM to select candidate paths for documents

Applies path-aware reranking for better document prioritization

🔎 Similar Papers

ir_explain: a Python Library of Explainable IR Methods