Staff AI Development Engineer (Enterprise AI Ecosystem)

About the job

As a Staff AI Development Engineer, you will be a lead architect in the design and implementation of our proprietary internal AI framework. You will move beyond simple "prompt engineering" to build robust, stateful, and cyclic multi-agent systems using LangGraph and LangChain. This is a high-impact software engineering role. You will not rely on rigid, off-the-shelf agent frameworks (like AutoGen or CrewAI). Instead, you will use low-level primitives to architect custom control flows, enabling LLMs to reason, plan, and execute complex tasks across Qualcomm’s massive engineering data landscape. You will define how agents maintain state, handle interrupts, and interact with internal APIs, setting the standard for AI application development across the company.

Responsibilities

1. Advanced Agentic Architecture

Custom Workflow Design: Architect and implement complex, cyclic AI workflows using LangGraph. Design state machines that enable agents to plan, execute, reflect, and retry tasks autonomously.

State Management: Engineer robust persistence layers (checkpointers) to manage long-running agent conversations, enabling "human-in-the-loop" interactions where users can guide or correct agent behavior mid-flight.

Multi-Agent Orchestration: Build custom orchestration layers where specialized agents (e.g., "Code Reviewer," "Doc Searcher," "Test Generator") collaborate via defined graph edges and conditional routing logic.

2.Framework & Tool Development

Internal SDK Evolution: Contribute core features to Qualcomm’s internal Python AI SDK, simplifying how other teams consume LLMs, embeddings, and vector stores.

Tool Abstraction: Develop standardized interfaces (based on LangChain tools) that allow agents to securely interact with enterprise systems (Jira, GitLab, SQL Databases, Proprietary APIs).

Graph-Based RAG: Go beyond simple semantic search by implementing GraphRAG and advanced retrieval strategies. Build pipelines that combine vector search with knowledge graph traversals to answer complex engineering queries.

3.Production Engineering & Scalability

Asynchronous Python: Write high-performance, non-blocking Python code using asyncio to handle concurrent agent execution and high-throughput token streaming.

API Design: Expose agent workflows via FastAPI microservices, ensuring strict typing (Pydantic), validation, and comprehensive open-api documentation.

Evaluation & Observability: Design "LLM-as-a-Judge" evaluation pipelines to measure the efficacy of custom agent graphs. Implement tracing (e.g., LangSmith or internal equivalents) to debug complex chains of thought.

4.Technical Leadership

Pattern Definition: Define and document design patterns for agentic AI (e.g., Plan-and-Solve, ReAct, Map-Reduce) for the wider engineering organization.

Mentorship: Serve as a technical mentor to Senior and Junior engineers, conducting rigorous code reviews and driving best practices in Python software design and AI safety.

Qualifications

Minimum

• Bachelor's degree in Engineering, Information Systems, Computer Science, or related field and 4+ years of Software Engineering or related work experience.

Master's degree in Engineering, Information Systems, Computer Science, or related field and 3+ years of Software Engineering or related work experience.

PhD in Engineering, Information Systems, Computer Science, or related field and 2+ years of Software Engineering or related work experience.

• 2+ years of work experience with Programming Language such as C, C++, Java, Python, etc.

Preferred

Master’s or PhD in Computer Science or Artificial Intelligence.

Advanced RAG: Experience implementing Hybrid Search (Keyword + Semantic), Re-ranking (Cross-Encoders), or Parent-Child document retrieval strategies.

Graph Theory: Academic or practical understanding of graph theory, finite state machines (FSM), and their application to AI control flow.

Database Expertise: Hands-on experience with Vector Databases (Milvus, Qdrant, Chroma) and Graph Databases (Neo4j).

DevOps/MLOps: Familiarity with Docker, Kubernetes, and CI/CD pipelines for deploying AI applications.

Evaluation Frameworks: Experience with frameworks like Ragas or DeepEval for quantifying RAG and Agent performance.