Research Engineer, Global E-commerce

About the job

The e-commerce industry has seen tremendous growth in recent years and has become a hotly contested space amongst leading internet companies. With millions of loyal users globally, we believe TikTok is an ideal platform to deliver a brand new and better e-commerce experience to our users. Our AI Innovation team is dedicated to building next-generation e-commerce experiences powered by AIGC and agentic shopping systems. We explore how AI-generated content and intelligent agents can meaningfully enhance user experiences and reshape the way people discover, evaluate, and purchase products.

Responsibilities

- Develop and optimize AIGC video generation models, with a focus on multimodal post-training for e-commerce short-video scenarios.

- Design and implement post-training approaches that integrate text, image, video, and product metadata to improve controllability, visual fidelity, and semantic alignment.

- Design and optimize training strategies, loss functions, and evaluation protocols for video generation models, balancing generation quality, robustness, and inference efficiency.

- Improve video generation performance in key dimensions such as motion realism, identity preservation, prompt adherence, and cross-modal consistency.

Qualifications

Minimum

- Bachelor’s degree in Computer Science, Machine Learning, Computer Vision, or a related technical field.

- Strong understanding of modern video generative model architectures, such as diffusion models, Transformer-based models (e.g., DiT), and VAEs.

- Hands-on experience in post-training or fine-tuning generative models (e.g., SFT, RLHF/DPO, LoRA/PEFT).

- Proficiency in mainstream deep learning frameworks such as PyTorch, JAX, or TensorFlow.

- Experience working with large-scale data and training pipelines, with an understanding of trade-offs among generation quality, stability, and inference cost.

Preferred

- Master’s or PhD degree in Computer Science, Artificial Intelligence, Mathematics, or a related technical field.

- Experience with temporal consistency and spatial consistency in video generation.

- Experience with multi-shot or long-form coherent video generation, such as scene continuity, shot transition modeling, or long-context video synthesis.

- Publications in ML conferences such as CVPR, ICCV, ECCV, NeurIPS, ICML, ICLR, SIGGRAPH, etc.