About the job
Unity Vector builds an offline ML platform that powers insight, experimentation, attribution, and AI-driven decision-making across the company. Our systems operate at scale across batch and streaming data, supporting analytics, product intelligence, machine learning pipelines, and business operations. As data volume and complexity grow, our platform enables large-scale model training, feature generation, and experimentation workflows that power production ML systems. We’re looking for a Machine Learning Engineer to join our Offline Infrastructure team. This is an ideal role for a recent university graduate who is excited to work on large-scale systems and apply research-driven thinking to real-world machine learning problems. You’ll help build and evolve the infrastructure that powers training data generation, ML workflows, and distributed model training. Working closely with experienced engineers and researchers, you’ll contribute to systems that ensure our ML pipelines are reliable, scalable, and efficient. This role offers the opportunity to bridge research and production—translating advanced ideas into systems that operate at scale.
Responsibilities
Build and maintain data pipelines that generate training datasets for machine learning models and experimentation
Contribute to infrastructure that supports distributed training workflows (e.g., PyTorch, Ray)
Work with workflow orchestration tools (e.g., Airflow, Flyte, or similar) to support multi-stage ML pipelines
Improve reproducibility and reliability through dataset validation, monitoring, and testing
Partner with ML engineers to support experimentation and model iteration
Help optimize performance and efficiency across data processing and training systems
Contribute to the evolution of our offline ML platform architecture as it scales
Qualifications
Minimum
Bachelor's degree in Computer Science, Machine Learning, Systems, or a related field
Strong foundation in machine learning systems, distributed systems, or large-scale data processing (through research or projects)
Experience with Python and working with data-intensive workloads
Familiarity with ML frameworks (e.g., PyTorch, TensorFlow) and/or distributed systems (e.g., Ray, Spark)
Experience (academic or applied) with data pipelines, model training workflows, or large datasets
Strong problem-solving skills and ability to translate research ideas into practical systems
Interest in building scalable, reliable infrastructure for machine learning
Preferred
Experience with workflow orchestration systems (Airflow, Flyte, etc.)
Exposure to large-scale data platforms (data lakes, warehouses, streaming systems)
Publications or research in ML systems, distributed systems, or related areas