About the job
As a Research Engineer on the Agent Orchestration team, you will design and build the systems that govern how Decagon agents operate in real-world environments. You will own complex, distributed systems that sit at the heart of the agent runtime: execution frameworks, model orchestration logic, and experimentation platforms that ensure agents are fast, reliable, and continuously improving. Your work will directly impact how agents reason, take actions, and deliver outcomes across millions of interactions. This role operates in a fast-moving, ambiguous space with tight feedback loops. You’ll move fluidly between diagnosing production issues, designing new system abstractions, and running experiments to improve agent behavior. You’ll collaborate closely with Research, Infra, and Product teams to ship improvements safely and at scale.
Responsibilities
Design and evolve agent harnesses that power different product experiences
Build core runtime systems, including AOP execution and multi-model orchestration
Develop control-plane logic for routing, planning, and tool invocation with strong safety guarantees
Optimize agent systems for latency, reliability, and production correctness
Analyze real-world failures and use data to drive iterative improvements
Build and operate online experimentation (A/B testing) and contribute to offline evaluation frameworks
Improve observability, testing, and simulation systems to ensure safe, measurable progress
Contribute to voice and real-time systems (e.g., transcription pipelines, turn-taking, latency improvements)
Continuously adapt orchestration systems as model capabilities evolve
Qualifications
Minimum
Strong experience building distributed systems or backend platforms in production environments
Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
Experience owning systems end-to-end, from design through production and iteration
Familiarity with experimentation, evaluation, or data-driven product improvement loops
A track record of improving system reliability, performance, and observability
Ability to debug complex systems and identify root causes of failures
Preferred
You’ve built or worked on agent harnesses, orchestration layers, or execution frameworks
You think in terms of control planes, feedback loops, and system-level optimization, not just features
You’re excited about diagnosing failure modes and iterating toward measurable improvements
You care deeply about production quality—not just making systems work, but making them reliable, safe, and scalable
You’re motivated by pushing the frontier of how intelligent systems behave in the real world