AdsQA: Towards Advertisement Video Understanding

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that large language models (LLMs) struggle to comprehend high-level semantic aspects of advertising videos—such as marketing logic, persuasive strategies, and audience engagement. To this end, we introduce AdsQA, the first multi-task benchmark for advertising video understanding, comprising 1,544 ads and 10,962 annotated video segments. We propose ReAd-R, a novel model integrating reinforcement learning, question-reflection mechanisms, and reward-driven optimization to enhance deep reasoning and answer generation in complex advertising contexts. Comprehensive evaluation across 14 mainstream LLMs demonstrates that ReAd-R significantly outperforms strong baselines—including those with advanced chain-of-thought capabilities—achieving state-of-the-art performance. Notably, this study pioneers the use of advertising videos as a rigorous testbed, advancing LLMs from low-level visual perception toward high-order marketing semantic cognition.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have taken a great step towards AGI. Meanwhile, an increasing number of domain-specific problems such as math and programming boost these general-purpose models to continuously evolve via learning deeper expertise. Now is thus the time further to extend the diversity of specialized applications for knowledgeable LLMs, though collecting high quality data with unexpected and informative tasks is challenging. In this paper, we propose to use advertisement (ad) videos as a challenging test-bed to probe the ability of LLMs in perceiving beyond the objective physical content of common visual domain. Our motivation is to take full advantage of the clue-rich and information-dense ad videos' traits, e.g., marketing logic, persuasive strategies, and audience engagement. Our contribution is three-fold: (1) To our knowledge, this is the first attempt to use ad videos with well-designed tasks to evaluate LLMs. We contribute AdsQA, a challenging ad Video QA benchmark derived from 1,544 ad videos with 10,962 clips, totaling 22.7 hours, providing 5 challenging tasks. (2) We propose ReAd-R, a Deepseek-R1 styled RL model that reflects on questions, and generates answers via reward-driven optimization. (3) We benchmark 14 top-tier LLMs on AdsQA, and our exttt{ReAd-R}~achieves the state-of-the-art outperforming strong competitors equipped with long-chain reasoning capabilities by a clear margin.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to perceive beyond objective visual content
Using advertisement videos to test marketing logic and persuasive strategies
Creating benchmark for ad video understanding with challenging QA tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

AdsQA benchmark with ad videos for LLM evaluation
ReAd-R RL model for reward-driven answer generation
State-of-the-art performance on ad video understanding tasks
🔎 Similar Papers
2024-02-20International Conference on Machine LearningCitations: 30
Xinwei Long
Xinwei Long
Tsinghua University
natural language processingmulti-modal learning
K
Kai Tian
Tsinghua University
P
Peng Xu
Tsinghua University
G
Guoli Jia
Tsinghua University
J
Jingxuan Li
Independent Researcher
S
Sa Yang
Peking University
Y
Yihua Shao
CASIA
Kaiyan Zhang
Kaiyan Zhang
Tsinghua University
Foundation ModelCollective IntelligenceScientific Intelligence
Che Jiang
Che Jiang
Tsinghua University
H
Hao Xu
Harvard University
Y
Yang Liu
Independent Researcher
J
Jiaheng Ma
Independent Researcher
B
Bowen Zhou
Tsinghua University, Shanghai Artificial Intelligence Lab