- Unifying Vision, Text, and Layout for Universal Document Processing (CVPR 2023)
- i-Code Studio: A Configurable and Composable Framework for Integrative AI (System Demonstrations on EMNLP 2024)
- i-code v2: An autoregressive generation framework over vision, language, and speech data (NAACL 2024)
- i-Code: An Integrative and Composable Multimodal Learning Framework (AAAI 2023)
- MACSum: Controllable Summarization with Mixed Attributes (TACL 2023)
Research Experience
Principal Research Scientist at Zoom AI. Previously worked at Snap Research and Microsoft Azure AI.
Background
Research interests are in Multimodal Generation and NLP. Particularly interested in building a unified system that can ground and reason on diversified external world knowledge, to realize multilingual human-machine communication.