Optimizing GPT for Video Understanding: Zero-Shot Performance and Prompt Engineering

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited zero-shot performance of GPT models in industrial video quality classification across seven fine-grained dimensions (e.g., sharpness, stability, color fidelity). We propose a “decompose–aggregate” prompt engineering framework: complex quality judgments are decomposed into sequential, interpretable subtask prompts, whose outputs are then logically aggregated to enhance consistency; additionally, we introduce a simplified decision strategy to substantially reduce false negatives. Experiments demonstrate that, without fine-tuning or reliance on frame encoders, our method achieves significant gains in zero-shot accuracy over single-prompt baselines. It exhibits strong generalization and cross-domain deployability on real-world industrial video data, effectively overcoming the fundamental bottleneck in large language models’ zero-shot understanding of raw video content.

Technology Category

Application Category

📝 Abstract
In this study, we tackle industry challenges in video content classification by exploring and optimizing GPT-based models for zero-shot classification across seven critical categories of video quality. We contribute a novel approach to improving GPT's performance through prompt optimization and policy refinement, demonstrating that simplifying complex policies significantly reduces false negatives. Additionally, we introduce a new decomposition-aggregation-based prompt engineering technique, which outperforms traditional single-prompt methods. These experiments, conducted on real industry problems, show that thoughtful prompt design can substantially enhance GPT's performance without additional finetuning, offering an effective and scalable solution for improving video classification systems across various domains in industry.
Problem

Research questions and friction points this paper is trying to address.

Optimizing GPT for zero-shot video classification.
Improving GPT performance via prompt engineering.
Reducing false negatives through policy simplification.
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT optimization for video understanding
Prompt engineering enhances performance
Decomposition-aggregation technique outperforms traditional
🔎 Similar Papers
No similar papers found.
M
Mark Beliaev
Tiktok Inc.
V
Victor Yang
Tiktok Inc.
M
Madhura Raju
Tiktok Inc.
Jiachen Sun
Jiachen Sun
University of Michigan Ann arbor
VLMLLMMLLM
X
Xinghai Hu
Tiktok Inc.