BrandFusion: A Multi-Agent Framework for Seamless Brand Integration in Text-to-Video Generation

📅 2026-03-03

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work introduces and addresses the novel problem of seamless brand integration in text-to-video generation, aiming to simultaneously preserve semantic fidelity to user intent, ensure brand recognizability, and achieve natural contextual blending. To this end, the authors propose BrandFusion, a multi-agent framework that constructs an offline brand knowledge base and orchestrates five online agents to iteratively refine generation prompts. The framework integrates lightweight fine-tuned brand prior detectors with context-aware embedding mechanisms to harmonize brand elements within dynamic video content. Extensive experiments demonstrate that BrandFusion significantly outperforms existing baselines across 18 mainstream and 2 custom brands, achieving state-of-the-art performance in semantic fidelity, brand identifiability, contextual naturalness, and human evaluation scores.

Technology Category

Application Category

📝 Abstract

The rapid advancement of text-to-video (T2V) models has revolutionized content creation, yet their commercial potential remains largely untapped. We introduce, for the first time, the task of seamless brand integration in T2V: automatically embedding advertiser brands into prompt-generated videos while preserving semantic fidelity to user intent. This task confronts three core challenges: maintaining prompt fidelity, ensuring brand recognizability, and achieving contextually natural integration. To address them, we propose BrandFusion, a novel multi-agent framework comprising two synergistic phases. In the offline phase (advertiser-facing), we construct a Brand Knowledge Base by probing model priors and adapting to novel brands via lightweight fine-tuning. In the online phase (user-facing), five agents jointly refine user prompts through iterative refinement, leveraging the shared knowledge base and real-time contextual tracking to ensure brand visibility and semantic alignment. Experiments on 18 established and 2 custom brands across multiple state-of-the-art T2V models demonstrate that BrandFusion significantly outperforms baselines in semantic preservation, brand recognizability, and integration naturalness. Human evaluations further confirm higher user satisfaction, establishing a practical pathway for sustainable T2V monetization.

Problem

Research questions and friction points this paper is trying to address.

brand integration

text-to-video generation

semantic fidelity

brand recognizability

contextual naturalness

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework

brand integration

text-to-video generation