AI Data Foundation Research Engineer · Hewlett Packard Enterprise

About the job

Successful candidate will develop new methods for context discovery, retrieval, filtering, prioritization, multi-modal data representation, advanced reasoning, tool calling, and reasoning trace validation in conversational, deep research, and agentic AI workflows. Successful candidate will also work on development of capture, management, search, enhancement and interpretation of meta-data and lineage for AI pipelines that enable reproducibility, reuse and optimization of pipelines; discovery, selection and usage of relevant high quality data for trustworthy AI outcomes across multiple AI applications; development, evaluation and testing of Foundation AI models for different modalities: Natural Language Processing - NLP, Large Language Models - LLM, Time Series Analysis, Computer Vision, AI for Science, etc., and augmentation of AI models with structured knowledge (i.e., knowledge infused learning).

Responsibilities

Develop new methods for context discovery, retrieval, filtering, prioritization, multi-modal data representation, advanced reasoning, tool calling, and reasoning trace validation in conversational, deep research, and agentic AI workflows.

Work on development of capture, management, search, enhancement and interpretation of meta-data and lineage for AI pipelines that enable reproducibility, reuse and optimization of pipelines.

Discover, select and use relevant high quality data for trustworthy AI outcomes across multiple AI applications.

Develop, evaluate and test Foundation AI models for different modalities: Natural Language Processing - NLP, Large Language Models - LLM, Time Series Analysis, Computer Vision, AI for Science, etc.

Augment AI models with structured knowledge (i.e., knowledge infused learning).

Qualifications

Minimum

PhD in Computer Science or related fields with a focus on data engineering and data science, in particular Machine Learning, Deep Learning, and/or data management for AI, plus 3 years of relevant industry experience.

Research experience in Generative AI, Deep Learning and Machine Learning

Experience with advanced AI model architectures: LLMs, Time Series Foundation Models, Diffusion Models, etc.

Expertise with end-to-end pipelines for AI and Machine Learning and in particular the data layer underlying the pipeline

Preferred

Strong programming skills in Python with high proficiency in data structures and algorithms. C/C++ skills

Experience with CI/CD code development

Outstanding analytical and problem-solving skills

Experience with hybrid AI-HPC workflows (e.g., AI surrogate modeling, computational steering of experiments)

Experience with knowledge graphs and knowledge infused learning

Expertise in research of data and workflow management systems

Experience in system software performance and scalability optimization

Experience with multi-threaded programming, parallel processing, OOD/OOP/distributed programming

Experience in containerized development and orchestration tools (e.g. Kubernetes, Ezmeral)