Published papers include 'Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods' (ECCV 2024), 'PREDITOR: Text Guided Image Editing with Diffusion Prior' (Preprint 2023), 'Controlled and Conditional Text to Image Generation with Diffusion Prior' (Preprint 2023), 'Cross-Modal Coherence Model for Text to Image Retrieval' (AAAI 2022).
Research Experience
Senior Applied Research Scientist at Adobe's Applied Research team, working on language-vision research and Generative AI. Led several projects including Adobe's FireFly Image Model 3, zero-shot stylized image generation, and Structure Match.
Education
PhD from the Computer Science department at Rutgers University's Intelligent Visual Interfaces lab, supervised by Dr. Mubbasir Kapadia and Dr. Gerard De Melo.
Background
Research interests include vision-language understanding, particularly multimodal story comprehension, story illustration, visual storytelling, image captioning, and text-to-image retrieval/generation. Current work focuses on text to image generation with diffusion/flow based models, interactive editing, and large multimodal models.
Miscellany
Looking for summer PhD research interns to work on text to image generation, interactive and multi-turn editing, and multimodal models.