🤖 AI Summary
Large language models (LLMs) exhibit poor robustness in multiple-choice question answering (MCQA), being highly sensitive to input perturbations. To address this, we propose Token Constraint Decoding (TCD), a post-hoc, model-agnostic decoding method that enforces token-level prediction consistency without fine-tuning. TCD introduces— for the first time—the token-level prediction alignment mechanism into the decoding process, integrating dynamic logit penalization with prompt engineering. Experiments demonstrate that TCD effectively mitigates overconfident predictions and requires model-specific penalty scheduling. On benchmarks including CommonsenseQA, MMLU, and MMLU-Pro, TCD boosts absolute accuracy by up to 39% for weaker models (e.g., Gemma-3B-1B) under noisy inputs, significantly enhancing inference stability in realistic scenarios involving imperfect inputs.
📝 Abstract
Large Language Models (LLMs) have demonstrated impressive performance on multiple-choice question answering (MCQA) benchmarks, yet they remain highly vulnerable to minor input perturbations. In this paper, we introduce and evaluate Token Constraint Decoding (TCD). This simple yet effective inference-time algorithm enforces alignment between token-level predictions to enhance robustness in noisy settings. Through extensive experiments on CommonsenseQA, MMLU, and MMLU-Pro, we show that TCD, especially when paired with prompt engineering (PE) fixes, significantly restores performance degraded by input noise, yielding up to +39% absolute gains for weaker models like Gemma3 1B. Penalty sweep analyses further reveal that TCD implicitly regularizes overconfident outputs, with different models requiring distinct penalty schedules to maximize resilience. Our findings establish TCD as a practical, model-agnostic approach for improving reasoning stability under real-world imperfections and pave the way for more reliable deployment of LLMs in safety-critical or user-facing applications.