Say It Another Way: A Framework for User-Grounded Paraphrasing

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Large language models (LLMs) exhibit high sensitivity to minor lexical or syntactic variations in prompts; however, existing evaluation methods often rely on hand-crafted or unnatural perturbations, failing to reflect robustness under authentic linguistic usage. Method: We propose the first linguistics-driven minimal transformation classification framework for prompt rewriting—characterized by fine-grained, controllable, and interpretable transformations grounded in user context. Our approach integrates BBQ benchmark adaptation, dual-verification via human annotation and automated consistency checking, and quantitative stability analysis. Contribution/Results: Experiments reveal that natural paraphrasing induces accuracy fluctuations exceeding 20%, exposing a widespread lack of paraphrase robustness in current LLM evaluations. This work establishes a foundational paradigm for paraphrase-aware LLM assessment, advancing evaluation standards toward linguistic realism and contextual fidelity.

Technology Category

Application Category

📝 Abstract

Small changes in how a prompt is worded can lead to meaningful differences in the behavior of large language models (LLMs), raising concerns about the stability and reliability of their evaluations. While prior work has explored simple formatting changes, these rarely capture the kinds of natural variation seen in real-world language use. We propose a controlled paraphrasing framework based on a taxonomy of minimal linguistic transformations to systematically generate natural prompt variations. Using the BBQ dataset, we validate our method with both human annotations and automated checks, then use it to study how LLMs respond to paraphrased prompts in stereotype evaluation tasks. Our analysis shows that even subtle prompt modifications can lead to substantial changes in model behavior. These results highlight the need for robust, paraphrase-aware evaluation protocols.

Problem

Research questions and friction points this paper is trying to address.

Study how prompt wording affects LLM behavior stability

Develop framework for natural prompt variation generation

Assess LLM sensitivity to subtle paraphrasing changes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework for controlled paraphrasing using linguistic transformations

Validated with human annotations and automated checks

Studies LLM responses to subtle prompt modifications

🔎 Similar Papers

Design Opportunities for Explainable AI Paraphrasing Tools: A User Study with Non-native English Speakers