Pun Unintended: LLMs and the Illusion of Humor Understanding

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This study identifies a fundamental limitation in large language models (LLMs): their inability to resolve semantic ambiguity in puns, stemming from shallow pattern reliance rather than deep semantic comprehension of humorous incongruity. To rigorously assess this, we introduce PunBench—the first systematically enhanced pun detection benchmark—constructed via semantics-preserving rewrites that substitute critical pun constituents to increase difficulty. We complement it with multi-dimensional robustness analysis and stringent human evaluation. Experiments reveal a catastrophic performance drop (average accuracy decline of 32.7%) for mainstream LLMs fine-tuned on pun data, confirming their lack of genuine humor understanding. Our work is the first to quantitatively expose LLMs’ structural vulnerability to semantic puns; it further stimulates critical reflection on deep linguistic representation learning and establishes a new benchmark and methodological framework for humor understanding and robust natural language understanding research.

Technology Category

Application Category

📝 Abstract

Puns are a form of humorous wordplay that exploits polysemy and phonetic similarity. While LLMs have shown promise in detecting puns, we show in this paper that their understanding often remains shallow, lacking the nuanced grasp typical of human interpretation. By systematically analyzing and reformulating existing pun benchmarks, we demonstrate how subtle changes in puns are sufficient to mislead LLMs. Our contributions include comprehensive and nuanced pun detection benchmarks, human evaluation across recent LLMs, and an analysis of the robustness challenges these models face in processing puns.

Problem

Research questions and friction points this paper is trying to address.

LLMs' shallow understanding of pun humor

Subtle pun changes misleading language models

Robustness challenges in LLM pun processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic benchmark analysis and reformulation

Human evaluation across recent LLM models

Robustness challenge analysis for pun processing

🔎 Similar Papers

No similar papers found.