May 2025: Preprint 'Token-Efficient Long Video Understanding for Multimodal LLMs' released on arXiv
Sep 2024: 'Slot State Space Models' accepted to NeurIPS 2024
Feb 2024: 'Layout-Agnostic Scene Text Image Synthesis with Diffusion Models' accepted to CVPR 2024
Sep 2023: 'Object-Centric Slot Diffusion' accepted to NeurIPS 2023 as a Spotlight paper (top 3%)
Apr 2023: Related work accepted to CVPR 2023 GCV Workshop
2020: 'Generative Neurosymbolic Machines' accepted to NeurIPS 2020 as Spotlight (top 4%)
2020: Co-authored 'Improving Generative Imagination in Object-Centric World Models' published at ICML 2020
2020: Co-authored 'SCALOR: Generative World Models with Scalable Object Representations' published at ICLR 2020
Background
Currently a Research Scientist at NVIDIA Research
Research interests lie at the intersection of representation learning and visual reasoning
Focused on developing novel architectures to enhance agents' visual reasoning capabilities
Long-term goal is to build AI agents capable of human-like reasoning, including uncovering latent structures of the physical world, predicting future scenarios, inferring causality or correlation between events, and performing logical planning
Current focus areas include Multimodal LLMs, Vision Foundation Models, and their synergy