Words at Play: Benchmarking Audio Pun Understanding in Large Audio-Language Models

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of systematic evaluation and benchmark datasets for pun comprehension in spoken language within current large audio-language models. To bridge this gap, the authors introduce APUN-Bench, the first benchmark specifically designed for audio pun understanding, comprising 4,434 human-annotated audio samples organized into three staged tasks: pun detection, keyword localization, and meaning inference. The study conducts a comprehensive evaluation of ten state-of-the-art models, revealing significant deficiencies in handling positional bias and performing semantic reasoning. By establishing a structured evaluation framework and providing empirical analysis, this research lays the foundational groundwork for advancing machine comprehension of spoken humor.

Technology Category

Application Category

📝 Abstract
Puns represent a typical linguistic phenomenon that exploits polysemy and phonetic ambiguity to generate humour, posing unique challenges for natural language understanding. Within pun research, audio plays a central role in human communication except text and images, while datasets and systematic resources for spoken puns remain scarce, leaving this crucial modality largely underexplored. In this paper, we present APUN-Bench, the first benchmark dedicated to evaluating large audio language models (LALMs) on audio pun understanding. Our benchmark contains 4,434 audio samples annotated across three stages: pun recognition, pun word location and pun meaning inference. We conduct a deep analysis of APUN-Bench by systematically evaluating 10 state-of-the-art LALMs, uncovering substantial performance gaps in recognizing, localizing, and interpreting audio puns. This analysis reveals key challenges, such as positional biases in audio pun location and error cases in meaning inference, offering actionable insights for advancing humour-aware audio intelligence.
Problem

Research questions and friction points this paper is trying to address.

audio pun understanding
large audio-language models
benchmark
humor comprehension
spoken puns
Innovation

Methods, ideas, or system contributions that make the work stand out.

audio pun understanding
large audio-language models
APUN-Bench
humor-aware audio intelligence
multimodal linguistic benchmark
🔎 Similar Papers
No similar papers found.
Y
Yuchen Su
School of Computer Science, University of Auckland, New Zealand
S
Shaoxin Zhong
School of Computer Science, University of Auckland, New Zealand
Y
Yonghua Zhu
Singapore University of Technology and Design, Singapore
R
Ruofan Wang
School of Computer Science, University of Auckland, New Zealand
Zijian Huang
Zijian Huang
ECE PhD Candidate, University of Michigan
LLM/VLMSecurityRLMR
Q
Qiqi Wang
School of Statistics and Data Science, Nankai University, China
Na Zhao
Na Zhao
Singapore University of Technology and Design
Computer VisionMachine LearningScene Understanding3D PerceptionMultimedia
D
Diana Benavides-Prado
School of Electronic Engineering and Computer Science, Queen Mary University of London
Michael Witbrock
Michael Witbrock
Professor of Computer Science, Waipapa Taumata Rau: The University of Auckland
Artificial IntelligenceReasoningDeep LearningRepresentation LearningNatural Language Understanding