About the job
In this role, you will have the opportunity to build a next-generation ML data and feature platform to significantly improve the productivity of ML practitioners. Our goal is to enable our ML practitioners to easily define and test ML features and labels, while our platform takes care of the computation, storage, and serving of feature values for both high-throughput training and low-latency member-scale inference use cases.
Responsibilities
Design and build a near-real-time feature computation engine to generate ML features for both high-throughput training and low-latency inference applications.
Operate and manage the feature computation pipelines and feature serving infrastructure for various ML models across multiple ML domains.
Build and scale systems that accelerate training through performant data loading, transformation, and writing.
Create frameworks to streamline and expedite the availability of new data for training and serving.
Develop feature stores that enable feature discovery and sharing.
Increase the productivity of ML practitioners by making it easy to define and access features and labels for experimentation and productization.
Qualifications
Minimum
Experience in building ML or data infrastructure
Strong empathy and passion for providing a fantastic user experience to ML practitioners
Experience in building and operating 24/7 high-traffic and low-latency online applications
Experience with large-scale data processing frameworks such as Spark, Flink, and Kafka
Experience in working with and optimizing Scala and/or Python codebases
Experience with public clouds, especially AWS
Self-driven and highly motivated team player
Preferred
Experience in building and operating ML feature stores
Experience with Functional Programming
Experience working with Notebooks such as Jupyter or Polynote