AllocMV: Optimal Resource Allocation for Music Video Generation via Structured Persistent State

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

229K/year
🤖 AI Summary
This work addresses the high computational cost and cross-shot consistency challenges in long-form music video generation by proposing a global planning framework formulated as a Multiple-Choice Knapsack Problem (MCKP). The approach constructs a structured persistent state incorporating character and scene priors along with a shared graph, and introduces a beat-repetition-driven visual prefix reuse strategy to maintain rhythmic coherence while substantially reducing computation. By integrating multimodal saliency estimation, dynamic programming optimization, and a hierarchical forking-and-reuse mechanism, the method achieves an optimal trade-off between perceptual quality and resource consumption under strict budgetary and rhythmic constraints, as quantified by the Cost-Quality Ratio (CQR) metric.
📝 Abstract
Generating long-horizon music videos (MVs) is frequently constrained by prohibitive computational costs and difficulty maintaining cross-shot consistency. We propose AllocMV, a hierarchical framework formulating music video synthesis as a Multiple-Choice Knapsack Problem (MCKP). AllocMV represents the video's persistent state as a compact, structured object comprising character entities, scene priors, and sharing graphs, produced by a global planner prior to realization. By estimating segment saliency from multimodal cues, a group-level MCKP solver based on dynamic programming optimally allocates resources across High-Gen, Mid-Gen, and Reuse branches. For repetitive musical motifs, we implement a divergence-based forking strategy that reuses visual prefixes to reduce costs while ensuring motif-level continuity. Evaluated via the Cost-Quality Ratio (CQR), AllocMV achieves an optimal trade-off between perceived quality and resource expenditure under strict budgetary and rhythmic constraints.
Problem

Research questions and friction points this paper is trying to address.

music video generation
computational cost
cross-shot consistency
long-horizon video synthesis
resource allocation
Innovation

Methods, ideas, or system contributions that make the work stand out.

music video generation
resource allocation
structured persistent state
Multiple-Choice Knapsack Problem
cross-shot consistency