About the job
We are hiring a researcher with a strong technical background in Image/Video generation and editing, as well as Multimodal Foundation Models. You will play a critical role in the research and development of multimodal foundation models for image/video/3D generation, editing, animation, and many more. As a member of the team, you will have the opportunity to develop fundamental model capabilities, collaborate with team members with diverse backgrounds to work on ambitious projects, and collaborate broadly across Apple with world-class engineers and researchers to advance our products and delight millions of users.
Responsibilities
Developing, fine-tuning, and evaluating foundational image generation and image editing models, as well as unified multimodal foundation models capable of both visual understanding and generation.
Developing, fine-tuning, and evaluating domain-specific image generation and editing models for various tasks and applications in Apple’s AI-powered products.
Conducting innovative research and transferring pioneering research in generative AI to production-ready technologies.
Understanding product requirements, translating them into modeling tasks and engineering tasks.
Qualifications
Minimum
PhD, MS or equivalent experience
Experience in machine learning, deep learning and statistical modeling.
Experience in developing models for computer vision tasks, such as object detection, visual question answering.
Experience in image generation models, such as VAE, GAN, and diffusion models
Proficiency in one of the following deep learning frameworks: PyTorch, Jax, Tensorflow
Proficiency in one of following languages: Python, Go, Java, C++
Preferred
Experience in developing state-of-the-art image generation/editing models.
Good interpersonal skills and team player.