Research Engineer/Scientist (all levels), World Models

About the job

The Vision-Applied Research team focuses on applied research in Generative AI and CV/Multimodal Understanding, and delivering intelligent solutions to Tiktok, enabling users to make and share creative content in a much easier way. The team has research groups dedicated to generative models for content creation, image generation, video synthesis, intelligent image/video editing, and world models. The team is looking for Research Engineer / Scientists who can take initiatives in building next-generation World Models. The candidate will work on developing methods and infrastructure to train large-scale generative models from massive simulated and real-world multimodal datasets. This role places a particular emphasis on ensuring long-horizon temporal consistency, realistic physics, complex dynamics from the model and enabling users and agents to interact with the model in real-time.

Responsibilities

Develop large-scale, diverse, and interactive multi-modal data generation pipeline.

Develop training pipeline for long-context interactive video generation models.

Advance video generation models to capture long-horizon temporal consistency, realistic physical dynamics, object interactions, and causal relationships from large-scale multi-modal data.

Qualifications

Minimum

M.S or Ph.D. in Computer Vision, Computer Graphics, Machine Learning, or equivalent experience.

Extensive research experiences in broad GenAI, multimodal foundation models, or Embodied AI areas.

Demonstrated ability to communicate complex technical concepts and collaborate effectively within cross-functional research teams

Preferred

Proven experiences in at least one of the following areas: video generation and synthesis; efficient and real-time diffusion models; 3D/physics-based simulation; or reinforcement learning for agentic environment interaction.

Proven track record of first-author publications in prestigious venues including CVPR, ICLR, NeurIPS, SIGGRAPH, and ICML