Multi-Prompting Decoder Helps Better Language Understanding

📅 2024-06-10

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

180K/year

🤖 AI Summary

In Model-as-a-Service (MaaS) settings, large language models (LLMs) are typically frozen and inaccessible for fine-tuning, while single-prompt decoding suffers from high prompt sensitivity and poor generalization—especially under few-shot conditions. To address these challenges, we propose Multi-Prompt Decoding (MPD), a lightweight, parameter-free framework. MPD generates multiple semantically complementary prompts from a single input example to elicit diverse hidden representations and classification logits from pre-trained language models (PLMs). It introduces an optimal transport-based alignment mechanism to harmonize the implicit state distributions across prompts and incorporates a score calibration module to robustly aggregate multi-prompt predictions. Crucially, MPD requires no parameter updates or gradient computation. Evaluated on multiple few-shot natural language understanding (NLU) benchmarks, MPD consistently outperforms strong single-prompt baselines and establishes new state-of-the-art performance, effectively mitigating both prompt sensitivity and data scarcity limitations.

Technology Category

Application Category

📝 Abstract

Recent Pre-trained Language Models (PLMs) usually only provide users with the inference APIs, namely the emerging Model-as-a-Service (MaaS) setting. To adapt MaaS PLMs to downstream tasks without accessing their parameters and gradients, some existing methods focus on the output-side adaptation of PLMs, viewing the PLM as an encoder and then optimizing a task-specific decoder for decoding the output hidden states and class scores of the PLM. Despite the effectiveness of these methods, they only use a single prompt to query PLMs for decoding, leading to a heavy reliance on the quality of the adopted prompt. In this paper, we propose a simple yet effective Multi-Prompting Decoder (MPD) framework for MaaS adaptation. The core idea is to query PLMs with multiple different prompts for each sample, thereby obtaining multiple output hidden states and class scores for subsequent decoding. Such multi-prompting decoding paradigm can simultaneously mitigate reliance on the quality of a single prompt, alleviate the issue of data scarcity under the few-shot setting, and provide richer knowledge extracted from PLMs. Specifically, we propose two decoding strategies: multi-prompting decoding with optimal transport for hidden states and calibrated decoding for class scores. Extensive experiments demonstrate that our method achieves new state-of-the-art results on multiple natural language understanding datasets under the few-shot setting.

Problem

Research questions and friction points this paper is trying to address.

Adapts MaaS PLMs without accessing parameters or gradients

Reduces reliance on single prompt quality via multi-prompting

Enhances few-shot learning with richer PLM knowledge extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Prompting Decoder for MaaS adaptation

Optimal transport for hidden states

Calibrated decoding for class scores

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs