Paper 'WebDreamer: Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents' accepted by Transactions on Machine Learning Research (TMLR), 2025
Paper 'Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge' accepted by NeurIPS 2025
Co-developed 'ScienceAgentBench' for rigorous assessment of language agents in data-driven scientific discovery, published at ICLR 2025
Contributed to 'RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments', arXiv preprint (2025)
Multiple publications in top-tier venues including NeurIPS, TMLR, ICLR, and EMNLP