ImagerySearch: Adaptive Test-Time Search for Video Generation Beyond Semantic Dependency Constraints

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Video generation models perform well in realistic scenarios but degrade significantly in tasks requiring imagination—such as rare concept co-occurrence and long-distance semantic relations—due to distributional shift beyond training data. Existing test-time scaling methods employ fixed search spaces and static reward functions, lacking semantic adaptability to input prompts. To address this, we propose a prompt-guided adaptive test-time search framework that dynamically adjusts both the inference search space and reward function to enhance temporal coherence and visual plausibility. This work introduces, for the first time, an adaptive test-time search mechanism and establishes LDT-Bench—the first benchmark explicitly designed for evaluating long-distance semantic understanding in video generation. We further develop an automated evaluation protocol to quantify creative generation capability. Experiments demonstrate that our method substantially outperforms state-of-the-art baselines and existing test-time approaches on LDT-Bench, while achieving comparable gains on VBench.

Technology Category

Application Category

📝 Abstract

Video generation models have achieved remarkable progress, particularly excelling in realistic scenarios; however, their performance degrades notably in imaginative scenarios. These prompts often involve rarely co-occurring concepts with long-distance semantic relationships, falling outside training distributions. Existing methods typically apply test-time scaling for improving video quality, but their fixed search spaces and static reward designs limit adaptability to imaginative scenarios. To fill this gap, we propose ImagerySearch, a prompt-guided adaptive test-time search strategy that dynamically adjusts both the inference search space and reward function according to semantic relationships in the prompt. This enables more coherent and visually plausible videos in challenging imaginative settings. To evaluate progress in this direction, we introduce LDT-Bench, the first dedicated benchmark for long-distance semantic prompts, consisting of 2,839 diverse concept pairs and an automated protocol for assessing creative generation capabilities. Extensive experiments show that ImagerySearch consistently outperforms strong video generation baselines and existing test-time scaling approaches on LDT-Bench, and achieves competitive improvements on VBench, demonstrating its effectiveness across diverse prompt types. We will release LDT-Bench and code to facilitate future research on imaginative video generation.

Problem

Research questions and friction points this paper is trying to address.

Improving video generation for imaginative scenarios with rare concepts

Overcoming limitations of fixed search spaces in test-time scaling

Addressing long-distance semantic relationships beyond training distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic search space and reward adjustment

Prompt-guided adaptive test-time search strategy

Automated benchmark for long-distance semantic prompts

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs

2024-06-26Citations: 4

Adobe

$172,400 -- $334,500 annually. In California, the pay range for this position is $231,000 - $334,500. In Washington, the pay range for this position is $211,000 - $305,500

San Jose, California, United States of America / Seattle, Washington, United States of America / San Francisco, California, United States of America

AI Research Scientist, Computer Vision - Facebook Video Intelligence