🤖 AI Summary
This work addresses the challenges of semantic misalignment and inadequate motion adaptability in creative video generation, which often lead to disconnection between product selling points and video content as well as unnatural motions. To tackle these issues, the study introduces the first Structured Advertising Creative Knowledge Base (ACKB) and proposes a knowledge-driven creative video generation framework, KD-CVG, comprising a Semantic-Aware Retrieval (SAR) module and a Multimodal Knowledge Referencing (MKR) module. Leveraging graph attention networks and reinforcement learning, KD-CVG achieves precise semantic alignment while integrating textual, visual, and motion priors to enhance motion plausibility. Experimental results demonstrate that the proposed method significantly outperforms state-of-the-art approaches in both semantic alignment and motion adaptability, validating the efficacy of knowledge-driven paradigms for creative video generation.
📝 Abstract
Creative Generation (CG) leverages generative models to automatically produce advertising content that highlights product features, and it has been a significant focus of recent research. However, while CG has advanced considerably, most efforts have concentrated on generating advertising text and images, leaving Creative Video Generation (CVG) relatively underexplored. This gap is largely due to two major challenges faced by Text-to-Video (T2V) models: (a) \textbf{ambiguous semantic alignment}, where models struggle to accurately correlate product selling points with creative video content, and (b) \textbf{inadequate motion adaptability}, resulting in unrealistic movements and distortions. To address these challenges, we develop a comprehensive Advertising Creative Knowledge Base (ACKB) as a foundational resource and propose a knowledge-driven approach (KD-CVG) to overcome the knowledge limitations of existing models. KD-CVG consists of two primary modules: Semantic-Aware Retrieval (SAR) and Multimodal Knowledge Reference (MKR). SAR utilizes the semantic awareness of graph attention networks and reinforcement learning feedback to enhance the model's comprehension of the connections between selling points and creative videos. Building on this, MKR incorporates semantic and motion priors into the T2V model to address existing knowledge gaps. Extensive experiments have demonstrated KD-CVG's superior performance in achieving semantic alignment and motion adaptability, validating its effectiveness over other state-of-the-art methods. The code and dataset will be open source at https://kdcvg.github.io/KDCVG/.