Published over 10 papers, with more than 3200 citations. Recent research achievements include the EMPG framework, WideSearch benchmark, the paper 'Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks' accepted by ICCV 2025, and the DoctorAgent-RL system. Also contributed to the MCP Mark benchmark and the DeepSeek-VL2 project.
Research Experience
Gained valuable industry experience through internships at MSRA, DeepSeek, and ByteDance during his Ph.D. studies. Contributed to the development of DeepSeek VL2 and DeepSeek V3 at DeepSeek; worked on the Microsoft OneOCR project and the Microsoft Document Intelligence project under the guidance of Researcher Qiang Huo and Lei Sun at MSRA; recently started an internship with the ByteDance Seed team, working on LLM/MLLM Agent projects.
Education
Bachelor's Degree: Graduated from the School of the Gifted Young at USTC in 2021, majoring in Computer Science; Doctoral Degree: Pursuing a Ph.D. in the joint program between USTC and MSRA, co-supervised by Prof. Qiang Huo at MSRA and Prof. Jun Du at USTC.
Background
Research Interests: Document Intelligence (including OCR, document layout analysis, and document understanding) and Large Language Models (including MLLM, Agent, and RAG). Professional Field: Computer Science. Brief Introduction: Currently a fourth-year Ph.D. student in the joint program between the University of Science and Technology of China (USTC) and Microsoft Research Asia (MSRA).
Miscellany
Currently based in Beijing, seeking full-time job opportunities. Feel free to reach out via email jarvisustc@gmail.com for a resume or to set up a coffee chat.