“Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents,” CVPR 2025 & CVPR Workshop on Multimodal Foundation Models 2025; introduced MONDAY dataset (313K annotated frames from 20K instructional videos) enabling robust cross-platform mobile agent generalization
“Mobile OS Task Procedure Extraction from YouTube,” NeurIPS Workshop on Video-Language Models 2024 (Non-Archival); proposed MOTIFY method for extracting task sequences from YouTube without manual annotation
“Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning,” ICLR 2024; proposed COCOA to address distributional shifts in offline RL
“MPChat: Towards Multimodal Persona-Grounded Conversation,” ACL 2023; constructed MPChat multimodal persona-grounded dialogue dataset, demonstrating critical role of visual modality