About the job
Yahoo serves as a trusted guide for hundreds of millions of people globally, helping them achieve their goals online through our portfolio of iconic products. For advertisers, Yahoo Advertising offers omnichannel solutions and powerful data to engage with our brands and deliver results. A Little About Us: The Yahoo! Consumer Data Team manages a petabyte warehouse to glean insights on Yahoo Media products and to improve the experience for its massive user base. The team interacts and works across multiple organizations at yahoo to grow user engagement and user experience across yahoo's product portfolio. Your work will directly influence product changes and you will work with some of the brightest engineers you have known to improve the user experience on yahoo properties and contribute to company growth. Along the way, you will solve problems for an Internet Pioneer that is hard to match in the industry. Summary: The ideal candidate will have strong AIML experience to design, build, and optimize scalable data pipelines and infrastructure that power advanced analytic solutions. In this role, you will collaborate closely with software engineers and business stakeholders to prepare and transform large datasets, support end-to-end model development and deployment, and ensure robust, efficient, and secure data flows. You will leverage your expertise in cloud platforms, big data tools, and machine learning frameworks to drive innovation and deliver actionable insights that advance our organizations AI initiatives and business objectives.
Responsibilities
Design, build, and maintain scalable data pipelines and ETL processes to support machine learning and AI initiatives on Google Cloud Platform (GCP). Implement and optimize data storage solutions using GCP services such as BigQuery, Cloud Storage, and Dataflow. Ensure data quality, integrity, and security throughout the data lifecycle. Collaborate with analysts and business stakeholders to understand data requirements and deliver actionable insights. Monitor, troubleshoot, and maintain the health and performance of cloud-based data infrastructure. Automate manual processes and repetitive tasks to improve efficiency and reduce errors. Apply data governance and compliance best practices to protect sensitive information and meet regulatory standards. Stay current with new GCP features, tools, and best practices to continuously enhance data management capabilities. Document solutions, processes, and architectural decisions to facilitate knowledge sharing and maintainability.
Qualifications
Minimum
BS or MS in Computer Science or a related major, or equivalent experience 7+ years of software engineering experience, with a strong emphasis on system design and backend development. 2+ years hands-on experience with Google Cloud Platform ecosystem (BigQuery, Dataproc, Composer, Dataflow, Data Catalog, Observability) or AWS equivalent. Proven ability to design, build, and maintain data pipelines that support machine learning and AI model development, training, and deployment. Fluency with at least one object-oriented programming language from Java, Python, or Scala is highly desirable, as these skills are critical for developing robust applications and managing data workflows effectively. SQL proficiency is also valued for database operations. Familiarity with data security, compliance, and governance best practices. Strong problem-solving skills, attention to detail, and ability to work collaboratively with cross-functional teams. Excellent communication skills and ability to tell insightful stories using data and also manage communication within internal teams and stakeholders.
Preferred
Exposure to AI-assisted development tools such as Claude, GitHub Copilot, Cursor, or similar is highly desirable. Experience with Google Analytics 360 is a plus.