Autonomous Agent Engineer

About the job

We're building the infrastructure that lets AI agents operate autonomously and securely at NVIDIA. This role owns the execution environments, state management systems, and security boundaries that make autonomous agents safe and reliable. The team designs and ships SDKs, CLIs, and developer tooling that turn complex sandboxing into a straightforward experience for agent builders and users across the company.

Responsibilities

Architect sandboxed compute environments where agents securely execute code, access tools, and interact with external services

Design and ship SDKs (Python, Go) and CLI tooling for provisioning and managing agent workloads in isolated environments

Create onboarding templates, reference implementations, and CLI workflows that make secure execution the default

Build state management for long-running agent operations, including checkpoint and recovery

Embed security into SDK primitives like isolation policies, secrets injection, network policies, capability declarations, and kill switches

Engineer auth integrations for workload identity, delegated tool access, and scope attenuation without static secrets

Build observability and audit infrastructure: structured logs, decision traces, security telemetry, and audit trails wired into enterprise monitoring

Qualifications

Minimum

BS or MS in Computer Science, Engineering, or related field (or equivalent experience)

8+ years building distributed systems, infrastructure, or developer platforms at scale

Deep systems engineering skills: containers, microVMs, Kubernetes, Linux security primitives

Track record of shipping developer SDKs or CLIs that are adopted by multiple teams

Experience building agents using various frameworks and harnesses in enterprise context

Proficiency in Python, Go, Rust, or similar

Preferred

Experience building execution environments for agentic AI systems or LLM applications that execute code autonomously

Experience with sandboxing and isolation technologies (gVisor, Firecracker, Kata Containers, V8 isolates, or similar)

Strong security fundamentals: threat modeling, auth, least privilege, secrets management

Designed multi-tenant execution platforms, serverless infrastructure, or sandboxed compute at scale

Background in durable execution patterns or checkpoint/recovery systems for long-running workloads