SlowFast-VGen: Dual-speed action-driven video generation
VisVM: For VLM self-training
Motion Consistency Model: Accelerating video diffusion
Idea2Img: An LMM-based agent system for visual design and creation
IDOL: Joint video-depth generation for human dance videos
MM-Narrator: Audio descriptions (AD) generation with GPT-4
DisCo: Human dance generation with disentangled controls
MaskComp: Completing visual objects
MPT: Human pose and mesh reconstruction
PaintSeg: Training-free segmentation
AdaM: Video matting
NVF: 3D Hand Pose Estimation
LAVENDER: Unifying video-language understanding
ResT: Zero-shot action recognition
SwinBERT: Video captioning
AdaFuse: Efficient action recognition
VA-RED2: Efficient action recognition
AR-Net: Efficient action recognition
VIST: Video instance segmentation tracking
Research Experience
Works at Microsoft, part of Azure and OpenAI collaboration.
Background
Principal Researcher at Microsoft, focusing on pushing the boundaries of multimodal understanding and generation. Has worked in the fields of computer vision, machine learning, and statistical deep learning. Research interests include algorithms for visual perception (object recognition, localization, segmentation, tracking, etc.), representation learning, and the interaction of vision and language.