Research Scientist, Multimodal Generative AI (Intelligent Creation) – Global Frontier Tech Recruitment Program - 2027 Start (PhD)

TikTok
San Jose, California

About the job

With rapid advances in AGI and foundation models, multimodal content creation across text, image, and video is undergoing a fundamental transformation. This role focuses on building next-generation multimodal and agentic foundation models that enable intelligent, efficient, and end-to-end creative workflows. You will work on full-modal understanding, AIGC-based image and video generation, and agentic systems, optimizing models through large-scale training and post-training (e.g., SFT, RL). The role also involves designing efficient model architectures and advancing reinforcement learning techniques to improve model capability, scalability, and real-world performance in creation scenarios.

Responsibilities

Conduct research and development in generative AI and multimodal models (e.g., image, video).

Develop large-scale foundation models (LLMs/VLMs), through post-training techniques.

Design and build models for creative applications on the TikTok platform, driving end-to-end impact from research to production through cross-functional collaboration.

Explore new AI-driven product opportunities and contribute to next-generation creative experiences.

Qualifications

Minimum

Individuals who are completing or have recently completed a PhD degree in Software Development, Computer Science, Computer Engineering, or a related technical discipline

Proficiency in training generative AI or large models using frameworks such as PyTorch or JAX.

Strong programming skills and solid fundamentals in machine learning.

Strong problem-solving ability and motivation to tackle real-world challenges.

Good communication and collaboration skills in fast-paced environments.

Preferred

Ph.D. in Generative AI, Machine Learning Systems, or a related field, or equivalent experience.

Strong research background in one or more areas: generative AI, LLMs/VLMs, or ML systems.

Hands-on experience in at least one of the following areas: Image/video generation and editing; VLM/LLM fine-tuning; Efficient model design and optimization; Reinforcement learning methods (e.g., RLHF, DPO, GRPO)

Track record of research publications in conferences such as CVPR, ECCV, ICCV, NeurIPS, ICLR, SIGGRAPH, or SIGGRAPH Asia.