- Pixel Motion as Universal Representation for Robot Control (arXiv:2505.07817)
- LLaRA: Supercharging Robot Learning Data for Vision-Language Policy (ICLR 2025)
- Understanding Long Videos in One Multimodal Language Model Pass (ICLR 2025)
- xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs (arXiv:2410.16267)
- Diffusion Illusions: Hiding Images in Plain Sight (SIGGRAPH 2024)
- Mirasol3B: A Multimodal Autoregressive Model for Time-aligned and Contextual Modalities (CVPR 2024)
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (CoRL 2023)
- Active Vision Reinforcement Learning under Limited Visual Observability (NeurIPS 2023)
- Token Turing Machine
Awards:
- SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention received the Best Paper Award in Robot Manipulation at ICRA 2024
- Diffusion Illusions: Hiding Images in Plain Sight received CVPR 2023 Outstanding Demo Award
Research Experience
Currently an associate professor in the Department of Computer Science at Stony Brook University; former assistant professor at Indiana University Bloomington; former staff researcher within the Robotics Section of NASA's Jet Propulsion Laboratory (JPL).
Education
Ph.D. from the University of Texas at Austin in 2008; B.S. from Korea Advanced Institute of Science and Technology (KAIST) in 2004.
Background
Research interests include robotics, computer vision, and artificial intelligence. Worked with the AI research team at Salesforce, and previously with the robotics team at Google DeepMind (formerly Google Brain) for 5.5 years. Currently an associate professor in the Department of Computer Science at Stony Brook University.