AllocMV: Optimal Resource Allocation for Music Video Generation via Structured Persistent State

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the high computational cost and cross-shot consistency challenges in long-form music video generation by proposing a global planning framework formulated as a Multiple-Choice Knapsack Problem (MCKP). The approach constructs a structured persistent state incorporating character and scene priors along with a shared graph, and introduces a beat-repetition-driven visual prefix reuse strategy to maintain rhythmic coherence while substantially reducing computation. By integrating multimodal saliency estimation, dynamic programming optimization, and a hierarchical forking-and-reuse mechanism, the method achieves an optimal trade-off between perceptual quality and resource consumption under strict budgetary and rhythmic constraints, as quantified by the Cost-Quality Ratio (CQR) metric.

📝 Abstract

Generating long-horizon music videos (MVs) is frequently constrained by prohibitive computational costs and difficulty maintaining cross-shot consistency. We propose AllocMV, a hierarchical framework formulating music video synthesis as a Multiple-Choice Knapsack Problem (MCKP). AllocMV represents the video's persistent state as a compact, structured object comprising character entities, scene priors, and sharing graphs, produced by a global planner prior to realization. By estimating segment saliency from multimodal cues, a group-level MCKP solver based on dynamic programming optimally allocates resources across High-Gen, Mid-Gen, and Reuse branches. For repetitive musical motifs, we implement a divergence-based forking strategy that reuses visual prefixes to reduce costs while ensuring motif-level continuity. Evaluated via the Cost-Quality Ratio (CQR), AllocMV achieves an optimal trade-off between perceived quality and resource expenditure under strict budgetary and rhythmic constraints.

Problem

Research questions and friction points this paper is trying to address.

music video generation

computational cost

cross-shot consistency

long-horizon video synthesis

resource allocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

music video generation

resource allocation

structured persistent state