🤖 AI Summary
This work addresses the “neutral regression” problem in large language models, wherein the incorporation of external context—even when uninformative—can override otherwise correct model outputs. The paper formally frames this issue as a “do-no-harm” principle and introduces NWCAD, a training-free, decoding-stage adapter that dynamically assesses context utility via a dual-stream architecture and a two-stage gating mechanism. When the provided context is deemed uninformative, NWCAD safely falls back to context-free decoding; otherwise, it integrates the context while preserving the ability to revert if needed. Experimental results across multiple benchmarks demonstrate that NWCAD effectively prevents performance degradation on samples where the baseline model is already correct, while simultaneously retaining substantial gains when the context is genuinely helpful.
📝 Abstract
Large language models (LLMs) can answer questions and summarize documents when conditioned on external contexts (e.g., retrieved evidence), yet context use remains unreliable: models may overwrite an already-correct output (neutral regression) even when the context is non-informative. We formalize neutral regression as a do-no-harm requirement and quantify it by measuring accuracy drops on baseline-correct items under answer-consistent contexts. We propose No-Worse Context-Aware Decoding (NWCAD), a decode-time adapter built on a two-stream setup with a two-stage gate: it backs off to no-context decoding when the context is non-informative, and otherwise uses context-conditioned decoding with a CAD-style fallback under uncertainty. We evaluate NWCAD on benchmarks that separate do-no-harm reliability from context utilization (accuracy gains on genuinely helpful contexts). NWCAD prevents neutral regression on baseline-correct items while preserving strong context-driven accuracy on helpful contexts.