Paper 'Grounding Everything: Emerging Localization Properties in Vision-Language Transformers' accepted at CVPR 2024; developed several open-source libraries such as MaskInversion, LeGrad, GEM (Grounding Everything Method), and Data Stream; involved in research projects like DEX-AR, MaskInversion, LeGrad, etc.
Research Experience
Spent the summer 2024 at MIT CSAIL as a visiting scholar, working with Hendrik Strobelt and Angie Boggust; attended the BMVA Symposium on Vision and Language, presenting both an oral and a poster; gave a talk at Cohere For AI - Community Talks.
Education
Master of Engineering in Applied Mathematics from ENSTA Paris (France); Master of Science in Statistics and applied Probabilities from the National University of Singapore (NUS). PhD student at Tübingen AI Center, advised by Prof. Hilde Kuehne.
Background
Primary research area: deep learning for multimodal models, including improving pretraining processes, understanding internal prediction mechanisms, and exploring zero-shot adaptation capabilities. Participating in the MIT-IBM Watson Sight and Sound Project.