FLEx: Language Modeling with Few-shot Language Explanations

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models frequently repeat the same errors, yet acquiring abundant expert-level natural language explanations is prohibitively expensive. This work proposes a fine-tuning-free approach that identifies representative errors via embedding clustering, validates and summarizes their explanations, and constructs runtime prefix prompts to steer the model away from recurring mistakes. Requiring only a small number of exemplars with explanations, the method achieves efficient correction of repetitive errors for the first time, substantially reducing reliance on large-scale annotated data. Evaluated on CounterBench, GSM8K, and ReasonIF benchmarks, it reduces residual errors by up to 83% compared to standard chain-of-thought (CoT) prompting.

Technology Category

Application Category

📝 Abstract
Language models have become effective at a wide range of tasks, from math problem solving to open-domain question answering. However, they still make mistakes, and these mistakes are often repeated across related queries. Natural language explanations can help correct these errors, but collecting them at scale may be infeasible, particularly in domains where expert annotators are required. To address this issue, we introduce FLEx ($\textbf{F}$ew-shot $\textbf{L}$anguage $\textbf{Ex}$planations), a method for improving model behavior using a small number of explanatory examples. FLEx selects representative model errors using embedding-based clustering, verifies that the associated explanations correct those errors, and summarizes them into a prompt prefix that is prepended at inference-time. This summary guides the model to avoid similar errors on new inputs, without modifying model weights. We evaluate FLEx on CounterBench, GSM8K, and ReasonIF. We find that FLEx consistently outperforms chain-of-thought (CoT) prompting across all three datasets and reduces up to 83\% of CoT's remaining errors.
Problem

Research questions and friction points this paper is trying to address.

language models
model errors
natural language explanations
few-shot learning
expert annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-shot Learning
Natural Language Explanations
Prompt Engineering
Error Correction
In-context Learning
🔎 Similar Papers
No similar papers found.