About the job
As a part of Machine Learning, Systems and Cloud AI (MSCA), we believe that high quality data is key to building better Machine Learning (ML) models, especially in the era of Large Language Models (LLMs). We work directly with model teams to define, measure, and improve data quality, promote best practices, and increase data availability and awareness.
Responsibilities
Work with large, data sets. Conduct analysis that includes data gathering and requirements specification, processing, cleaning and curation, analysis, visualization, ongoing deliverables, and presentations. Represent analysis to stakeholders and organization executives in order to share insights, influence product direction and answer difficult questions regarding data quality measurement and impact on model performance. Define key metrics that are statistically sound and meaningful to measure data quality for data in various shapes and forms, as well as to measure progress of customer engagement. Research and develop analysis and optimization methods to improve the quality of Google's ML portfolio and applications, including LLM model and training data planning. Conduct independent research and advance the state of understanding in how data impacts ultimate quality of large language models and creating spend optimization priorities with data acquisition.
Qualifications
Minimum
Master's degree in Statistics, Data Science, Mathematics, Physics, Economics, Operations Research, Engineering, or a related quantitative field. 10 years of work experience using analytics to solve product or business problems, coding (e.g., Python, R, SQL), querying databases or statistical analysis, or 8 years of work experience with a PhD degree.
Preferred
12 years of work experience using analytics to solve product or business problems, coding (e.g., Python, R, SQL), querying databases or statistical analysis, or 10 years of work experience with a PhD degree. 6 years of experience as a people manager within a technical leadership role.