Machine Learning Platform Engineer, AI Evaluation Platform (All levels)

Apple
Seattle, United States of America2025-12-12

About the job

Join Apple Services Engineering to build the next generation of AI evaluation systems. We are seeking machine learning platform engineers at multiple levels (Mid-Level to Principal) to architect and build high-availability services and internal tools that enable self-service evaluation at scale. You will partner with researchers to operationalize their innovations, transforming complex workflows into intuitive, developer-first platforms. We are looking for builders who thrive in the ambiguity of new initiatives and are passionate about creating scalable infrastructure.

Responsibilities

System Design & Implementation: Design, code, and ship high-quality Python services. For senior candidates: Lead the architecture for the core evaluation engine and distributed services. For mid-level candidates: Own the end-to-end implementation of specific features and API endpoints.

Technical Leadership & Collaboration: Mentor junior engineers, conduct code reviews, and drive technical decision-making. Foster a culture of technical excellence and rapid delivery through example and collaboration.

Operationalizing Science: Partner closely with Applied Scientists to translate novel metrics, judge prompts, and scoring algorithms into scalable, production-grade services. Create frameworks to evaluate not just simple responses, but also multi-turn agent trajectories and tool usage.

System Integration: Serve as a technical bridge between the research organization and the broader engineering ecosystem, ensuring our tools integrate seamlessly with existing ML infrastructure and developer workflows.

Engineering Rigor: Champion the software development lifecycle (SDLC) for the team, writing comprehensive automated testing (CI/CD), and instrumenting monitoring to ensure high availability and reliability.

Qualifications

Minimum

2+ years of hands-on software engineering experience (or Master's degree with relevant project experience). Note: We are hiring across multiple seniority levels; expectations will scale with experience.

Strong proficiency in the Python ecosystem (e.g., FastAPI, Pydantic, Pandas). You are capable of writing production-grade code and contributing to architectural discussions on day one.

Customer Obsession & Product Thinking: Experience acting as a technical partner to internal customers. You can translate vague requirements from other teams into concrete engineering specifications.

Demonstrated experience partnering with Data Scientists or Researchers: You have the ability to navigate the ambiguity of research workflows and operationalize scientific code.

Functional literacy in AI/ML concepts: You understand the fundamental lifecycle of machine learning (datasets, training vs. inference, evaluation metrics) and can discuss the engineering challenges involved in serving models.

Strong expertise in API Design & Internal Tools: You have built APIs that other developers rely on, with a focus on versioning, backward compatibility, and developer experience.

Operational excellence background: You have practical experience using CI/CD pipelines, containerization (Docker/Kubernetes), and monitoring (Datadog/Prometheus).

BS CS , Master's preferred.

Preferred

Experience building MLOps & Platform Infrastructure: You have architected the foundational infrastructure for AI, such as model registries, inference services, or feature stores (using tools like Kubernetes, Ray, or Kubeflow).

Deep familiarity with AI Evaluation Frameworks: You have used or contributed to modern evaluation tools like DeepEval, Ragas, TruLens, or LangSmith. You understand how to implement and scale model-based evaluation workflows.

Deep understanding of Generative AI & Agents: You understand the engineering challenges of relying on LLMs and Agents as software components—specifically managing token economics, handling rate limits, and evaluating non-deterministic, multi-step reasoning capabilities.

Builder Experience: You have thrived in startup-like environments, navigating high ambiguity to deliver complex technical roadmaps from scratch.