Machine Learning Research Engineer (Human Sensing), SIML

About the job

As a Machine Learning Research Engineer, you will be responsible for designing and developing cutting-edge AI/ML models for Human Sensing, with a focus on building robust cross-domain identity recognition systems. Multi-modal Human Sensing is a foundational capability that powers intelligent experiences based on key human traits such as identity, expression, clothing, action, gesture, gaze and human-object interaction. Major Apple Intelligence experiences such as personalized Natural Language Search, Memories Creation, as well as personalized Image Generation are powered by our ability to learn robust representations of visual human traits. Efficient real-time visual human sensing powers flagship Photography experiences such as Cinematic mode and Photographic Styles, communication experiences such as Center Stage, and paves the way for more natural human-device interactions, e.g., with the DockKit framework.

Responsibilities

Designing, implementing, and deploying state-of-the-art visual recognition systems.

Building foundation models for facial and full-body perception.

Driving data quality excellence through strategic dataset curation, validation, and generation to support world-class model development.

Building tools and frameworks for systematic failure analysis, identifying edge cases, and driving continuous model improvement.

Directly interacting with all cross-functional stakeholders to gather product requirements and translating these into actionable plans for ML research and development.

Effectively communicating results and insights to partners and senior leaders, providing clear and actionable recommendations.

Staying current with the latest trends, technologies, and standard methodologies in machine learning, multi-modal foundation models, computer vision and natural language understanding.

Actively contributing to Apple's ML community by disseminating research ideas and results, enhancing shared infrastructure, and mentoring fellow practitioners.

Qualifications

Minimum

Master's or Ph.D. in Computer Science, Computer Engineering, or related fields; or equivalent professional experience in Computer Vision (CV) and Machine Learning (ML) research and development.

Proven track record of training, fine-tuning and evaluating deep learning models for vision tasks using modern ML architectures.

Background in research and innovation, demonstrated through publications in top-tier journals or conferences, patents, or impactful software developments.

Proficient in Python, PyTorch or equivalent deep learning frameworks.

Preferred

Expert-level knowledge of state-of-the-art methods in face recognition or other facial analysis and biometric systems.

Hands-on experience training and scaling multi-modal large language models (LLMs) or large-scale vision-language models (VLMs).