Principal Software Engineer

About the job

Red Hat is seeking a Principal Software Engineer to join our team. In this role, you will collaborate with a diverse, highly motivated group of engineers to design and implement Agentic AI features and solutions and integrate partner solutions. You will also work closely with product management, other engineering teams at Red Hat, as well as Red Hat partners and lighthouse customers.

Responsibilities

Architectural Leadership: Lead the implementation of scalable, distributed computing solutions designed to serve Agentic AI and ensure seamless integration with the Red Hat product portfolio.

MAS Design: Define and implement Multi-Agent System (MAS) architectures, including orchestration layers, state machines, tool registries, and resilient routing policies with safe fallbacks.

Interoperability Standards: Hands-on experience implementing Model Context Protocol (MCP) for standardized tool/data access and Agent-to-Agent (A2A) or ACP protocols for cross-platform agent communication and task delegation.

Upstream Influence: Contribute to and influence upstream AI/ML communities to steer the evolution of open standards for agentic workflows.

Strategic Collaboration: Partner with AI/ML vendors and internal teams to refine AI strategies, addressing specific use cases that drive value through Red Hat’s next-generation UX.

Reference Architectures: Develop technical blueprints and multi-product demos that showcase the "Art of the Possible" using the Red Hat AI stack.

Innovation: Proactively explore emerging AI technologies to identify opportunities for incorporating new capabilities into software development workflows and tooling.

Engineering Excellence: Drive AI integration within the software development lifecycle (SDLC), sharing successful experiment use cases with stakeholders to foster broader innovation.

Qualifications

Minimum

7+ years of relevant software engineering experience

Bachelor’s degree in Computer Science or a related technical field, or equivalent practical experience

Agentic Frameworks: Proven experience building agents and tooling frameworks; deep expertise in LangGraph, PydanticAI, or similar state-management libraries.

Core AI Engineering: Experience implementing sophisticated RAG, long-term memory systems, semantic caches, and vector databases.

Systems Expertise: Expert-level proficiency in Python or Go, with a specific focus on building resilient, asynchronous distributed systems.

Infrastructure: Solid experience with containers and orchestration via OpenShift or Kubernetes.

Inference Optimization: Familiarity with model parallelization, quantization, and memory optimization (e.g., vLLM, DeepSpeed, OpenVino).

AI/MLOps: Experience with GitOps, automation pipelines, and managing the AI/ML lifecycle in production environments.

Evaluation & Safety: Direct experience with Agent Evaluation (Eval) frameworks (measuring success rates/hallucinations) and implementing Guardrails & Governance (preventing prompt injection/infinite loops).

Preferred

Cloud Computing experience with AWS, GCP, Azure, or IBM Cloud.

A history of open-source contributions or active participation in the AI/ML community (GitHub, Research, or Upstream).