AI Research Scientist, Computer Vision - Facebook Video Intelligence

About the job

The Video Intelligence team is an applied AI research team within the Facebook pillar. This role is expected to develop advanced video generation and understanding foundation models, enabling innovative AI-driven video creation experiences and enhancing our ability to comprehend video content. The team is responsible for building State-of-the-art GenAI technology to empower video generation and understanding.

Responsibilities

Build a variety of multimodal foundation models such as text-to-video generative models, image-to-video generative models, video understanding models, unified native video generative models

Design core foundation model architectures and progressive pre-train

Post-train foundation models using techniques such as Supervised Fine-Tuning (SFT), Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), and Low-Rank Adaptation (LoRA)

Conduct research to develop SOTA GenAI models for the Facebook family of apps

Collaborate with colleagues from the infrastructure and product teams on launching models

Qualifications

Minimum

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

PhD in Computer Science, Machine Learning, or a relevant technical field

1+ year of industry experience training multimodal, computer vision, LLM or related AI/ML models

Experience owning and/or driving complex technical projects from end-to-end

Publications at peer-reviewed conferences (e.g. ICLR, NeurIPS, ICML, KDD, CVPR, ICCV, ACL)

Programming experience in Python and hands-on experience with frameworks such as PyTorch

Must obtain work authorization in the country of employment at the time of hire, and maintain ongoing work authorization during employment

Preferred

First-authored publications at peer-reviewed conferences (e.g. ICLR, NeurIPS, ICML, KDD, CVPR, ICCV, ACL)

Experience collaborating in cross-functional teams, including product, engineering, and research

Experience building text-to-video generative models, image-to-video generative models, video understanding models, and/or unified native video generative models