AI Platform Engineer - Vice President

Morgan Stanley
New York, New York, United States of America2026-04-01Full time

About the job

Our mission is to develop a firmwide Artificial Intelligence (AI) Development Platform that aligns with the firm's Technology principles and drives efficiency and consistency, controls, security and strong governance and promotes innovation, enabling teams to build applications that leverage AI capabilities and accelerate the adoption of AI across our businesses. This role is for a platform engineering specialist who will help build a firmwide AI Development Platform and drive adoption of AI capabilities throughout the enterprise. We have multiple focus areas across the platform and are looking for energetic, multi-disciplinary candidates who are eager to contribute to providing scalable, secure, enterprise-wide solutions for the firm.

Responsibilities

Develop tooling and self-service capabilities for deploying AI solutions for the firm leveraging Kubernetes/OpenShift, Python, authentication solutions, APIs, REST framework, etc

Develop Terraform modules and Cloud architecture to enable secure AI cloud service deployment and consumption at scale

Have a platform mindset and build common, reusable solutions to scale Generative AI use cases using pre-trained models as well as fine-tuned models.

Leverage Kubernetes/OpenShift to develop modern containerized workloads

Integrate with capabilities such as large-scale vector stores for embeddings.

Author best practices on the Generative AI ecosystem, when to use which tools, available models such as GPT, Llama, Hugging Face etc. and libraries such as Langchain.

Analyze, investigate, and implement GenAI solutions focusing on Agentic Orchestration and Agent Builder frameworks.

Author and publish architecture decision records to capture major design decisions and product selection for building Generative AI solutions. Inclusive of app authentication, service communication, state externalization, container layering strategy and immutability.

Ensure AI platform are reliable, scalable, and operational; (e.g. blueprints for upgrade/release strategies (E.g. Blue/Green); logging/monitoring/metrics; automation of system management tasks)

Participate in all team's Agile/ Scrum ceremonies.

Participate in team's oncall rotation in build/run team model

Qualifications

Minimum

Bachelor's or Master's degree in Computer Science or related field, or equivalent job experience

10years of experience in software engineering, design and development

Strong hands-on Application Development background in at least one prominent programming language, preferably Python Flask or FAST Api.

Broad understanding of data engineering (SQL, NoSQL, Big Data, Kafka, Redis), data governance, data privacy and security.

Experience in development, management, and deployment of Kubernetes workloads, preferably on OpenShift.

Experience with designing, developing, and managing RESTful services for large-scale enterprise solutions.

Experience deploying applications on Azure, AWS, and/or GCP using IaC (Terraform)

Hands-on experience with multiprocessing, multithreading, asynchronous I/O, performance profiling in at least one prominent programming language, preferably python.

Ability to articulate technical concepts effectively to diverse audiences.

Excellent communication skills.

Demonstrated ability to work effectively and collaboratively in a global organization, across time zones, and across organizations

Demonstrated experience in DevOps, understanding of CI/CD (Jenkins) and GitOps.

Knowledge of DevOps and Agile practices.

Preferred

Practitioner of unit testing, performance testing and BDD/acceptance testing.

Understanding of OAuth 2.0 protocol for secure authorization.

Proficiency with Open Telemetry tools including Grafana, Loki, Prometheus, and Cortex.

Good knowledge of Microservice based architecture, industry standards, for both public and private cloud.

Good understanding of modern Application configuration techniques.

Hands on experience with Cloud Application Deployment patterns like Blue/Green.

Good understanding of State sharing between scalable cloud components (Kafka, dynamic distributed caching).

Good knowledge of various DB engines (SQL, Redis, Kafka, etc) for cloud app storage.

Experience building AI applications, preferably Generative AI and LLM based apps.

Deep understanding of AI agents, Agentic Orchestration, Multi-Agent Workflow Automation, along with hands-on experience in Agent Builder frameworks such Lang Chain and Lang Graph.

Experience working with Generative AI development, embeddings, fine tuning of Generative AI models.

Understanding of ModelOps/ ML Ops/ LLM Op.

Understanding of SRE techniques.