Paper 'Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge' accepted to NeurIPS 2025 (Datasets & Benchmarks track)
Paper 'Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents' accepted as an Oral at ICLR 2025 (top 1.8%), with over 185K+ downloads on Hugging Face
Paper 'GPT-4V(ision) is a Generalist Web Agent, if Grounded' accepted to ICML 2024
Published in TMLR: 'Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents'
Published at COLM 2025: 'An Illusion of Progress? Assessing the Current State of Web Agents'
Served as reviewer for major conferences: ICLR’25–‘26, UIST’25, ACL Rolling Review
Organized/participated in AI agent workshops including LLMAgents@ICLR’24 and Computer Use Agents@ICML’25
Research Experience
Conducting research on language agents and GUI agents within the OSU NLP Group
Led the development of the first open-sourced multimodal web agents (SeeAct, Multimodal-Mind2Web)
Proposed a vision-only GUI agent framework SeeAct-V and developed a new SOTA visual grounding model UGround
Built benchmarks for evaluating web agents: Online-Mind2Web (short-to-medium horizon tasks) and Mind2Web 2 (long-horizon agentic search tasks)
Collaborated with industry partners including Amazon AGI and Orby AI