About the job
The Senior iCloud Efficiency Engineer will play a critical role in advancing Apple’s next generation of intelligent infrastructure operations through applied GenAI and agentic technologies. This role focuses on building practical, high-impact AI systems that improve engineering workflows and infrastructure decision-making. You will identify high-leverage operational problems, set architecture direction, design agentic solutions, and guide teams from prototype to production adoption. The goal is combining LLM reasoning, system context, automation frameworks, and engineering safeguards to improve speed, reliability, and efficiency. Success in this role will be measured by concrete outcomes: adoption of shared patterns and tools by multiple teams, measurable toil reduction, validated cost or capacity savings. You will help define how AI can safely and effectively augment engineering teams—from capacity optimization and deployment analysis to incident response, forecasting, and infrastructure planning.
Responsibilities
Design and implement GenAI-powered solutions to improve infrastructure efficiency, operational workflows, and engineering productivity across iCloud services
Build and deploy agentic systems using technologies such as Claude, LLM orchestration frameworks, skills-based execution models, and intelligent automation pipelines
Develop AI-assisted workflows for capacity planning, anomaly detection, forecasting, deployment validation, and operational safety
Partner with SRE, infrastructure engineering, platform teams, and finance to identify high-value efficiency opportunities and convert them into scalable AI solutions
Create reliable and safe agent workflows with strong observability, guardrails, human-in-the-loop validation, and operational controls
Improve engineering decision-making by combining infrastructure telemetry, operational context, and LLM reasoning into actionable recommendations
Drive experimentation and adoption of AI-first engineering practices that reduce toil, improve reliability, and optimize cost structures
Contribute to the long-term strategy for AI-assisted infrastructure operations across Apple Cloud Services
Qualifications
Minimum
5+ years of experience in software engineering, infrastructure engineering, or large-scale cloud services environments
Proven experience designing, building, or technically leading production GenAI, ML platform, developer productivity, infrastructure automation, or tooling systems
Hands-on experience with GenAI technologies, LLM application architecture, including retrieval, context engineering, tool use, workflow orchestration, agentic workflows, evaluation, observability, and failure handling
Demonstrated technical leadership across teams, including architecture reviews, roadmap influence, mentoring, and driving adoption of shared engineering practices
Strong understanding of cloud infrastructure operations, observability, deployment systems, and operational safety principles
Proven ability to translate ambiguous operational challenges into practical engineering solutions with measurable business impact
Strong software development skills in Python, Java, or similar languages
Exceptional analytical, systems thinking, and cross-functional communication skills
Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical field
Preferred
Experience applying GenAI to infrastructure operations, SRE workflows, capacity planning, or engineering productivity systems
Experience building AI systems with operational guardrails, governance models, and safe deployment patterns for enterprise environments
Strong understanding of capacity forecasting, cost optimization, and infrastructure efficiency modeling at hyperscale
Background working in private cloud environments, large-scale storage systems, or global distributed infrastructure
PhD or advanced degree in Computer Science, Machine Learning, Distributed Systems, or related field