Central Data Platform Engineer - Software Dev Engineer I

Yahoo
Champaign, Illinois, USA

About the job

We're looking for a motivated entry-level AI Engineer to join our AI & ML team in Champaign, Illinois, where you'll have the opportunity to design and build scalable, high-performance tools in the areas of data governance, orchestration, and query technologies as part of the Central Data Platform Team. In this role, you will be instrumental in developing and deploying AI-powered features. Your responsibilities will include analyzing requirements, supporting prompt engineering and RAG workflows, and collaborating across product and engineering teams to integrate generative AI into production systems. Furthermore, this role contributes to delivering high-quality software and platform products that underpin Yahoo's enterprise data ecosystem.

Responsibilities

Assist in building AI features: use Python, LLM APIs (OpenAI, Anthropic, etc.), vector embedding pipelines.

Support prompt engineering and RAG workflows: design, test, iterate prompt templates, integrate vector search.

Help build and maintain AI-model monitoringobservability dashboards: track model accuracy, latency, drift and work with backend engineers to integrate AI services into the product

Participate in experimenting with AI workflows: multi-agent orchestration, model fine-tuning, system prompts.

Working through documents and conversations with colleagues to understand product requirements for new features.

Work closely with cross-functional teams to understand product and technical roadmaps, identifying potential impacts on system operability and proposing proactive solutions for Cloud environments.

Lead initiatives to enhance and optimize existing cloud infrastructure, drive improvements in scalability, efficiency, and resilience, and oversee large-scale projects related to cloud platforms, automation, and performance optimization.

Foster cross-functional collaboration between development, infrastructure, and operations teams to improve the overall performance, reliability, and security of services on cloud.

Qualifications

Minimum

A solid Computer Science foundation in data structures and algorithms, object oriented programming, and modern software engineering practices from your achievement of obtaining a degree in CS or a similar engineering pursuit.

Proactive in staying updated with evolving AI trends and new LLM releases.

Skilled at diagnosing and solving complex, ambiguous problems with curiosity and a product-focused mindset.

Experience working with the latest Large Language Models (LLMs) and AI advancements, cloud native AI services like Sagemaker, VertexAI, LangChain, LlamaIndex, or other LLM-orchestration libraries.

The ability to use an object oriented programming language like Java or C++ or scripting languages like Python or Perl, and Unix or Linux systems.

Knowledge of SQL and distributed query engines (e.g., Presto, Trino, Athena, BigQuery). Familiarity with data concepts such as joins, aggregation, projection, and explosion.

The ability to work with large-scale distributed systems.

Strong analytical and problem-solving skills with the ability to work effectively in a cross-functional, collaborative environment.

Preferred

Working knowledge of AWS and GCP cloud environments, including core data and compute services (e.g., EMR, MWAA, S3, Lambda, ECS, BigQuery, Dataproc).

Experience with data pipeline orchestration tools and frameworks such as Oozie and Airflow.

Query Execution and Optimization: Designing and optimizing queries to run efficiently on platforms such as BigQuery, Hive, Pig, and Spark, ensuring high performance and scalability.

Familiarity with modern data architectures, including lakehouse and Medallion design patterns.

Understanding of data processingdata governance concepts

Familiarity with AI-assisted engineering tools (e.g., Cursor, MCP, Copilot, agentic AI frameworks) and emerging AIML technologies that enhance data engineering productivity.

Experience working with IaC (eg. Terraform, Ansible).

Experience working with Infrastructure as Code (IaC) tools, such as Terraform, or CloudFormation, to automate and manage cloud infrastructure deployments and automations.

Familiarity & working experience with Kubernetes and container-based orchestration.