Mind the Quote: Enabling Quotation-Aware Dialogue in LLMs via Plug-and-Play Modules

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

122K/year

🤖 AI Summary

Current large language models (LLMs) lack the capability to locate and respond appropriately to explicit references in dialogue (e.g., “verify using the formula I just emphasized”). This work introduces the first reference-aware dialogue formalization and benchmark, along with a span-conditioned generation framework. We propose QuAda—a plug-and-play, zero-overhead, parameter-efficient module (<2.8% trainable weights)—that dynamically enhances attention focus on referenced spans via bottleneck-projected attention modulation. To train QuAda robustly, we devise a multi-stage consistency-verified data synthesis strategy. Evaluated on a novel benchmark covering five canonical reference scenarios, QuAda significantly outperforms strong baselines across diverse LLMs, demonstrating strong cross-model generalization. It supports unseen topics without modifying original prompts and requires no architectural changes to the base model.

Technology Category

Application Category

📝 Abstract

Human-AI conversation frequently relies on quoting earlier text-"check it with the formula I just highlighted"-yet today's large language models (LLMs) lack an explicit mechanism for locating and exploiting such spans. We formalise the challenge as span-conditioned generation, decomposing each turn into the dialogue history, a set of token-offset quotation spans, and an intent utterance. Building on this abstraction, we introduce a quotation-centric data pipeline that automatically synthesises task-specific dialogues, verifies answer correctness through multi-stage consistency checks, and yields both a heterogeneous training corpus and the first benchmark covering five representative scenarios. To meet the benchmark's zero-overhead and parameter-efficiency requirements, we propose QuAda, a lightweight training-based method that attaches two bottleneck projections to every attention head, dynamically amplifying or suppressing attention to quoted spans at inference time while leaving the prompt unchanged and updating<2.8% of backbone weights. Experiments across models show that QuAda is suitable for all scenarios and generalises to unseen topics, offering an effective, plug-and-play solution for quotation-aware dialogue.

Problem

Research questions and friction points this paper is trying to address.

LLMs lack explicit mechanisms for quoting earlier text

Need for span-conditioned generation in dialogue systems

Requirement for efficient, plug-and-play quotation-aware solutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play modules for quotation-aware dialogue

Lightweight QuAda method with bottleneck projections

Automatic synthesis of task-specific dialogue data

🔎 Similar Papers

RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues