About the job
Imagine shaping how millions of people discover content they love on the App Store, Apple Music, and Apple TV+. Our team is responsible for the intelligence that powers these deeply personal experiences. We are at a pivotal moment, defining the next generation of personalization. We build the foundational capabilities that empower product and research teams to deliver hyper-personalized experiences while maintaining an uncompromising commitment to user privacy. We believe that deep personalization shouldn't require compromising user trust, and we are pioneering the decentralized data systems to prove it.
Responsibilities
Architect Distributed Feature Access: Design and build the access layer that abstracts the physical location of data. Ensure that inference systems can seamlessly access real-time on-device context, cloud-based service history, and content metadata through a unified, familiar API.
Engineer Large-Scale Feature Pipelines: Build robust, petabyte-scale pipelines that ingest and combine disparate data into coherent user profiles and rich content representations.
Architect Training Data Systems: Transform raw data into the high-value features that train our next-generation ML models. Architect the systems that generate this data and seamlessly integrate it with our training infrastructure.
Optimize for Privacy & Scale: Build highly optimized stacks that extend existing data systems into privacy-constrained environments. Implement data minimization strategies to securely leverage rich user features without compromising trust.
Cross-Functional Innovation: Partner closely with data systems teams, core compute engineers, and ML teams to ensure the right context is delivered to the right compute environment at the exact right time.
Qualifications
Minimum
BS or MS in Computer Science, Data Engineering, Software Engineering, or a related field.
Senior-Level Experience: A proven track record of shipping complex, large-scale data engineering, feature serving, or machine learning systems to production.
Mastery of Big Data & Serving: Expertise in designing distributed data processing systems using technologies like Spark and Flink, and building low-latency, high-throughput data serving layers or Feature Stores.
Strong Software Engineering: Deep proficiency in Java or Go for building high-performance production backend systems, and Python for model training ecosystems.
Strategic Data Mindset: Demonstrated experience thinking critically about data architecture, including data ontology, discoverability, and bridging distributed data sources.
Preferred
Hybrid/Edge Computing: Experience building systems that bridge cloud backend systems with on-device or edge compute environments.
Embeddings & Vector Search: Familiarity with generating, managing, and serving dense embeddings for retrieval, ranking, and personalization systems.
Data Governance: Experience building feature stores, data catalogs, or implementing compliance-by-design in a regulated environment.
Privacy-Preserving Tech: Passion for privacy and an understanding of data minimization strategies, secure enclaves, or Privacy-Enhancing Technologies (PETs).