Towards Human Understanding of Paraphrase Types in Large Language Models

📅 2024-07-02

📈 Citations: 0

✨ Influential: 0

career value

127K/year

🤖 AI Summary

Existing paraphrase evaluation methods predominantly rely on binary discrimination, lacking fine-grained, interpretable analysis of linguistic transformations. Method: This paper introduces the Atomic Paraphrase Type (APT) framework, decomposing paraphrasing into ten fine-grained linguistic operations (e.g., insertion/deletion, subordinate clause rewriting), and presents APTY—a novel, manually annotated dataset of 800 instances with preference rankings, designed to support RLHF and DPO fine-tuning. We systematically evaluate ChatGPT and a DPO-finetuned Llama-7B across all APT categories via multi-prompt engineering and human preference studies. Contribution/Results: Models perform robustly on simple operations (e.g., lexical insertion/deletion) but exhibit significant limitations on complex syntactic transformations (e.g., subordinate structure rewriting). The APTY dataset and the APT analytical paradigm establish a new benchmark and methodology for targeted language capability modeling and controllable paraphrase optimization.

Technology Category

Application Category

📝 Abstract

Paraphrases represent a human's intuitive ability to understand expressions presented in various different ways. Current paraphrase evaluations of language models primarily use binary approaches, offering limited interpretability of specific text changes. Atomic paraphrase types (APT) decompose paraphrases into different linguistic changes and offer a granular view of the flexibility in linguistic expression (e.g., a shift in syntax or vocabulary used). In this study, we assess the human preferences towards ChatGPT in generating English paraphrases with ten APTs and five prompting techniques. We introduce APTY (Atomic Paraphrase TYpes), a dataset of 800 sentence-level and word-level annotations by 15 annotators. The dataset also provides a human preference ranking of paraphrases with different types that can be used to fine-tune models with RLHF and DPO methods. Our results reveal that ChatGPT and a DPO-trained LLama 7B model can generate simple APTs, such as additions and deletions, but struggle with complex structures (e.g., subordination changes). This study contributes to understanding which aspects of paraphrasing language models have already succeeded at understanding and what remains elusive. In addition, we show how our curated datasets can be used to develop language models with specific linguistic capabilities.

Problem

Research questions and friction points this paper is trying to address.

Evaluating paraphrase types in language models

Assessing human preferences for ChatGPT paraphrases

Developing datasets for fine-tuning linguistic capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Atomic Paraphrase Types decomposition

APTY dataset for fine-tuning

RLHF and DPO methods application

🔎 Similar Papers

No similar papers found.