Senior iCloud Efficiency Engineer (GenAI & Agentic Systems)

About the job

The Senior iCloud Efficiency Engineer will play a critical role in advancing Apple’s next generation of intelligent infrastructure operations through applied GenAI and agentic technologies. This role focuses on building practical, high-impact AI systems that improve engineering workflows and infrastructure decision-making. You will identify high-leverage operational problems, set architecture direction, design agentic solutions, and guide teams from prototype to production adoption. The goal is combining LLM reasoning, system context, automation frameworks, and engineering safeguards to improve speed, reliability, and efficiency. Success in this role will be measured by concrete outcomes: adoption of shared patterns and tools by multiple teams, measurable toil reduction, validated cost or capacity savings. You will help define how AI can safely and effectively augment engineering teams—from capacity optimization and deployment analysis to incident response, forecasting, and infrastructure planning.

Responsibilities

Design and implement GenAI-powered solutions to improve infrastructure efficiency, operational workflows, and engineering productivity across iCloud services

Build and deploy agentic systems using technologies such as Claude, LLM orchestration frameworks, skills-based execution models, and intelligent automation pipelines

Develop AI-assisted workflows for capacity planning, anomaly detection, forecasting, deployment validation, and operational safety

Partner with SRE, infrastructure engineering, platform teams, and finance to identify high-value efficiency opportunities and convert them into scalable AI solutions

Create reliable and safe agent workflows with strong observability, guardrails, human-in-the-loop validation, and operational controls

Improve engineering decision-making by combining infrastructure telemetry, operational context, and LLM reasoning into actionable recommendations

Drive experimentation and adoption of AI-first engineering practices that reduce toil, improve reliability, and optimize cost structures

Contribute to the long-term strategy for AI-assisted infrastructure operations across Apple Cloud Services

Qualifications

Minimum

5+ years of experience in software engineering, infrastructure engineering, or large-scale cloud services environments

Proven experience designing, building, or technically leading production GenAI, ML platform, developer productivity, infrastructure automation, or tooling systems

Hands-on experience with GenAI technologies, LLM application architecture, including retrieval, context engineering, tool use, workflow orchestration, agentic workflows, evaluation, observability, and failure handling

Demonstrated technical leadership across teams, including architecture reviews, roadmap influence, mentoring, and driving adoption of shared engineering practices

Strong understanding of cloud infrastructure operations, observability, deployment systems, and operational safety principles

Proven ability to translate ambiguous operational challenges into practical engineering solutions with measurable business impact

Strong software development skills in Python, Java, or similar languages

Exceptional analytical, systems thinking, and cross-functional communication skills

Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical field

Preferred

Experience applying GenAI to infrastructure operations, SRE workflows, capacity planning, or engineering productivity systems

Experience building AI systems with operational guardrails, governance models, and safe deployment patterns for enterprise environments

Strong understanding of capacity forecasting, cost optimization, and infrastructure efficiency modeling at hyperscale

Background working in private cloud environments, large-scale storage systems, or global distributed infrastructure

PhD or advanced degree in Computer Science, Machine Learning, Distributed Systems, or related field