HIPPO-Video: Simulating Watch Histories with Large Language Models for Personalized Video Highlighting

πŸ“… 2025-07-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing video highlight datasets lack user behavior modeling, hindering the characterization of personalized preferences. To address this, we propose the first large language model (LLM)-based user behavior simulator, generating a large-scale, annotated dataset comprising 20,400 videos with diverse viewing histories and fine-grained segment-level saliency scores. Methodologically, our approach integrates semantic classification with explicit preference modeling to synthesize realistic viewing sequences; we further introduce HiPHerβ€”a preference-conditioned saliency prediction model for personalized video highlighting. Experiments demonstrate that HiPHer significantly outperforms both generic and query-based baselines on personalized video highlight detection, validating the effectiveness and generalizability of our paradigm.

Technology Category

Application Category

πŸ“ Abstract
The exponential growth of video content has made personalized video highlighting an essential task, as user preferences are highly variable and complex. Existing video datasets, however, often lack personalization, relying on isolated videos or simple text queries that fail to capture the intricacies of user behavior. In this work, we introduce HIPPO-Video, a novel dataset for personalized video highlighting, created using an LLM-based user simulator to generate realistic watch histories reflecting diverse user preferences. The dataset includes 2,040 (watch history, saliency score) pairs, covering 20,400 videos across 170 semantic categories. To validate our dataset, we propose HiPHer, a method that leverages these personalized watch histories to predict preference-conditioned segment-wise saliency scores. Through extensive experiments, we demonstrate that our method outperforms existing generic and query-based approaches, showcasing its potential for highly user-centric video highlighting in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Personalized video highlighting for diverse user preferences
Lack of personalization in existing video datasets
Predicting saliency scores using watch histories
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based user simulator for watch histories
Personalized saliency scores from histories
Outperforms generic query-based methods
πŸ”Ž Similar Papers