Research Scientist - Seed Multimodal Interaction and World Model

ByteDance
China / Singapore / United States2025-07-31计算机视觉

About the job

The Seed Multimodal Interaction and World Model team is dedicated to developing models that have boast human-level multimodal understanding and interaction capabilities. The team also aspires to advance the exploration and development of multimodal assistant products

Responsibilities

- Research and development large-scale multimodal foundation models

- Develop unified modeling frameworks that integrate video, audio, and language, with a focus on visual latent reasoning

- Explore Reinforcement Learning-based approaches to bridge understanding and generation for multimodal visual reasoning

- Collaborate with researchers to evaluate models on tasks involving world modeling, reasoning, and instruction-conditioned generation

Qualifications

Minimum

- Master's or PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline

- Publications in accredited venues, such as CVPR, ECCV, ICCV, NeurIPS, ICLR, ICML, or other leading conferences

- Strong research background in at least one of the following: reinforcement learning, multimodal learning, video understanding, or vision-language modeling

Preferred

- Experience with reinforcement learning in multimodal or interactive environments

- Familiarity with video generation or diffusion-based generative models

- Experience with large-scale model training

- Solid programming and engineering skills, with experience building training or evaluation pipelines for ML models