AI Evaluation Program Manager

About the job

You will be a vital member of our ML Data Team – which leads the full spectrum of video-language data preparation and model evaluation. This role comes with high ownership and includes responsibilities such as defining dataset needs and requirements in consultation with our research and product teams; designing and building data pipelines; and driving our post-training model evaluation strategy. You will also be responsible for automating as much of the repetitive partnership, annotation, and quality evaluation work as possible. A desire to work cross functionally and to build relationships is critical for success in this position.

Responsibilities

- Model Evaluation: Design and build robust model evaluation frameworks, automating repetitive processes and maintaining a balanced approach to efficiency and depth in obtaining evaluation metrics and feedback.

- Portfolio Monitoring: Manage resource allocation and timelines, adjusting direction flexibly based on real-time information across all data streams in your product vertical.

- External Partner Collaboration: Enhance dataset and process quality through seamless collaboration with vendors and outsourcing partners.

- Data Quality & Tooling Advancement: Establish labeling guidelines, monitor data quality, and improve tools and infrastructure to build a sustainable data operations framework.

- Internal Collaboration: Partner with Engineering and AI Model teams to align on top priority data needs, design tools such as analytical reports and dashboards, and clearly communicate project progress.

Qualifications

Minimum

- 5+ years of experience working in an AI focused data operations organization.

- A proven track record designing and executing large scale data or evaluation projects, including gathering, labeling, and post-processing data.

- The ability to analyze messy and complex data, identify overarching patterns, and distill your findings into crisp annotation guidelines or model quality reports.

- Proficiency with Python, LLMs, or other popular industry tools for automation.

- Excellent communication and project management skills, and the ability to support several projects simultaneously.

- A foundational understanding of and interest in LLMs/VLMs and multimodal AI.

- Conviction that data is the key ingredient for the performance and assessment of AI models.

Preferred

- Experience in data collection and labeling for multimodal language models.

- Experience in red teaming, localization testing, or other evaluation focused fields.

- Experience working with research scientists and engineers.

- Expertise or interest in video-centric domains, such as sports, advertising, and content creation.