Research interests include LLM efficiency (e.g., KV cache retrieval, off-loading, compression, and other inference optimization topics), LLM architecture (e.g., native sparse attention, test-time training), and LLM memorization (e.g., parametric memory, agent memory). He considers long-context as the most important problem in LLM.