🤖 AI Summary
To address the inefficiency and weak contextual modeling of LLM-driven sparse retrieval in conversational search, this paper proposes a relaxed knowledge distillation framework. Departing from conventional representation-level distillation, it introduces a novel similarity-level relaxed distillation objective that directly transfers soft dialogue–document similarity scores generated by multiple large language model teachers (e.g., GPT-4, Claude). The framework jointly optimizes contrastive learning and distillation losses while explicitly enforcing sparsity constraints to enable controllable sparsification. Built upon the Learnable Sparse Retrieval (LSR) architecture, the method is evaluated across five standard benchmarks. Results show up to a 6.0% improvement in out-of-domain Recall, consistently surpassing single-teacher baselines. Moreover, it supports fine-grained sparsity control, effectively balancing retrieval accuracy and computational efficiency.
📝 Abstract
Conversational Search (CS) involves retrieving relevant documents from a corpus while considering the conversational context, integrating retrieval with context modeling. Recent advancements in Large Language Models (LLMs) have significantly enhanced CS by enabling query rewriting based on conversational context. However, employing LLMs during inference poses efficiency challenges. Existing solutions mitigate this issue by distilling embeddings derived from human-rewritten queries, focusing primarily on learning the context modeling task. These methods, however, often separate the contrastive retrieval task from the distillation process, treating it as an independent loss term. To overcome these limitations, we introduce DiSCo (Distillation of Sparse Conversational retrieval), a novel approach that unifies retrieval and context modeling through a relaxed distillation objective. Instead of relying exclusively on representation learning, our method distills similarity scores between conversations and documents, providing more freedom in the representation space and better leveraging the contrastive nature of document relevance. Extensive experiments on Learned Sparse Retrieval (LSR) across five CS datasets demonstrate that DiSCo achieves substantial improvements in both in-domain and out-of-domain retrieval tasks, achieving up to a six-point gain in recall for out-of-domain datasets over state-of-the-art methods. Additionally, DiSCo employs a multi-teacher distillation strategy, using multiple LLMs as teachers, further enhancing performance and surpassing the individual teachers in in-domain settings. Furthermore, analysis of model sparsity reveals that DiSCo allows for more effective control over the sparsity of the trained models.