- Paper: Tokasaurus: An LLM Inference Engine for High-Throughput Workloads
- Paper: CodeMonkeys: Scaling Test-Time Compute for Software Engineering
- Preprint: Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
- Preprint: Hydragen: High-Throughput LLM Inference with Shared Prefixes
Research Experience
- 3D generative model research at Nvidia's Toronto AI Lab, advised by Professor Sanja Fidler
- Theoretical and recommender system research at Layer 6 AI
- Computer vision research at Akasha Imaging (acquired by Intrinsic)
- Currently working in the Scaling Intelligence Lab at Stanford
Education
- Stanford University, working with Professor Azalia Mirhoseini
- University of Oxford, supervised by Professor Ronald Clark
- University of Waterloo, majoring in Software Engineering with a joint major in Combinatorics and Optimization
Background
Third-year PhD student, previously at Oxford and now at Stanford. Broadly interested in research addressing gaps in models that prevent them from being applied to currently out-of-reach real-world tasks. This includes improved long-context understanding and data efficiency, methods that allow models to continually learn from new experiences, and architectures whose capability scales better with test-time compute. Currently, researching how we can train natively parallel reasoning models with RL.