Scholar

Omar Khattab

Google Scholar ID: Lwr5ozgAAAAJ

MIT EECS & CSAIL

Natural Language ProcessingInformation RetrievalML SystemsAI Software

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

13,276

H-index

i10-index

Publications

Co-authors

list available

Contact

CVOpen ↗TwitterOpen ↗GitHubOpen ↗

Publications

13 items

Vector Policy Optimization: Training for Diversity Improves Test-Time Search

2026

Cited

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

2026

Cited

optimize_anything: A Universal API for Optimizing any Text Parameter

2026

Cited

OBLIQ-Bench: Exposing Overlooked Bottlenecks in Modern Retrievers with Latent and Implicit Queries

2026

Cited

Meta-Harness: End-to-End Optimization of Model Harnesses

2026

Cited

LIR: The First Workshop on Late Interaction and Multi Vector Retrieval @ ECIR 2026

2025

Cited

Reasoning-Intensive Regression

2025

Cited

Multi-module GRPO: Composing Policy Gradients and Prompt Optimization for Language Model Programs

2025

Cited

Resume (English only)

Academic Achievements

Published multiple papers including 'Reasoning-Intensive Regression', 'GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning', 'WARP: An Efficient Engine for Multi-Vector Retrieval'. Won Best Paper Award at SIGIR 2025.

Research Experience

Worked as a Research Scientist at Databricks. Research directions include: 1. Building Reliable AI Systems with Language Models; 2. Developing Effective & Efficient Retrieval Models. Developed influential open-source research systems such as DSPy framework and ColBERT retrieval model.

Education

Ph.D. in Computer Science from Stanford, advised by Matei Zaharia and Christopher Potts, and part of Stanford NLP. During his Ph.D., he was supported by the Apple Scholars in AI/ML PhD Fellowship.

Background

Assistant Professor at MIT EECS and a member of CSAIL. Research interests include Natural Language Processing (NLP) and AI systems, specifically how to program intelligent software systems that are partly specified in natural language, process natural language at scale, and optimize quality and cost using language models.

Co-authors

29 total