About the job
In this role, you will design and maintain infrastructure for collecting, processing, and analyzing massive volumes of CI results data - and you will integrate AI capabilities that transform how developers interact with failure information. Your work will turn complex failure signals into actionable insights that accelerate development, using AI to summarize failures, separate genuine issues from distractions, and help engineers focus on what needs attention. Success requires flexibility, proactivity, and thriving in a supportive environment with challenging problems. You'll need excellent judgment for timely technical decisions, ability to collaborate effectively on design discussions, and strong technical depth to make informed tradeoffs about when and how AI can genuinely improve developer workflows. In your role as a CI Systems Engineer, you will work at the intersection of AI and developer tools, shaping how thousands of engineers across Apple diagnose failures. You'll have the autonomy to evaluate new AI approaches, influence infrastructure architecture, and see your work directly reduce friction in Apple's software development process.
Responsibilities
Develop AI-assisted failure analysis systems that transform raw CI data into actionable insights, helping developers quickly diagnose root causes and understand test failure patterns
Design and implement AI-powered triage workflows that intelligently summarize failures, identify patterns across large result sets, and distinguish signal from noise
Build and integrate tools that give AI systems structured access to CI data, enabling intelligent querying and analysis
Optimize data structures and database design for fast storage and deduplication of build and test failures, ensuring AI systems have efficient access to the context they need
Drive performance improvements and optimization initiatives for results storage and query latency to meet developer needs
Collaborate with OS engineering teams across platforms (iOS, macOS, etc.) to understand diagnostic needs and refine AI-assisted analysis capabilities
Implement observability and alerting for the CI results infrastructure itself, ensuring reliability and detecting systemic failure patterns
Evaluate and iterate on AI approaches, measuring their effectiveness at reducing triage time and improving developer experience
Share technical knowledge and best practices with team members on failure analysis, AI integration patterns, data systems design, and infrastructure challenges
Qualifications
Minimum
BS in Computer Science or equivalent professional experience
8+ years of software engineering experience, preferably 2+ years focused on CI infrastructure, data systems, or failure analysis
Experience applying AI/ML or LLM-based approaches to software development workflows, tooling, or automation
Proficiency in one or more languages suited to systems and data work (Swift, Scala, Python, Go, C/C++, etc.)
Proven ability to work independently on complex problems and collaborate effectively on team initiatives
Strong communication skills to collaborate with diverse teams and translate complex failure data into developer-friendly insights
Demonstrated experience in designing or contributing to systems that handle scale, data integrity, and query performance
Preferred
Experience building or integrating with AI agents using the latest-available tools such as Skills, MCP Servers, Plugins, or LLM-powered tooling
Proven experience integrating AI into developer workflows with measurable impact on engineering efficiency; code review, testing, debugging, triage, or productivity tooling
Familiarity with machine learning techniques applied to failure correlation, anomaly detection, or pattern recognition
Deep expertise in data storage, retrieval, and analysis, including experience with relational and NoSQL databases
Experience building data pipelines or working with distributed data processing frameworks
Background working on large-scale data systems, observability platforms, or analytics infrastructure
Experience with CI/CD failure analysis, test result aggregation, or build system diagnostics, including root cause analysis, diagnostic tooling, and observability practices
Knowledge of iOS or macOS internals, development environments, build agents, and testing infrastructure