Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Existing video generation models struggle to precisely specify complex objectives involving physical dynamics through text or images. To address this limitation, this work proposes the Goal Force framework, which, for the first time, incorporates user-defined force vectors and intermediate physical dynamics as explicit conditioning inputs to guide the generation of physically plausible videos. Trained on a dataset constructed from synthetic causal primitives—such as elastic collisions and domino collapses—the model achieves zero-shot generalization to complex real-world scenarios using only simple synthetic data. Experiments demonstrate that the model exhibits strong physical reasoning capabilities in tasks involving tool manipulation and multi-object causal chains, effectively functioning as an implicit neural physics simulator.

Technology Category

Application Category

📝 Abstract

Recent advancements in video generation have enabled the development of ``world models''capable of simulating potential futures for robotics and planning. However, specifying precise goals for these models remains a challenge; text instructions are often too abstract to capture physical nuances, while target images are frequently infeasible to specify for dynamic tasks. To address this, we introduce Goal Force, a novel framework that allows users to define goals via explicit force vectors and intermediate dynamics, mirroring how humans conceptualize physical tasks. We train a video generation model on a curated dataset of synthetic causal primitives-such as elastic collisions and falling dominos-teaching it to propagate forces through time and space. Despite being trained on simple physics data, our model exhibits remarkable zero-shot generalization to complex, real-world scenarios, including tool manipulation and multi-object causal chains. Our results suggest that by grounding video generation in fundamental physical interactions, models can emerge as implicit neural physics simulators, enabling precise, physics-aware planning without reliance on external engines. We release all datasets, code, model weights, and interactive video demos at our project page.

Problem

Research questions and friction points this paper is trying to address.

goal specification

video generation

physics-conditioned goals

physical interaction

world models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Goal Force

video generation

physics-conditioned goals